GCP-PCA
Google Cloud Professional Cloud Architect
The Google Cloud Professional Cloud Architect certification validates the ability to design, develop, and manage robust, secure, scalable, highly available, and dynamic solutions to drive business objectives. This is Google Cloud's most popular professional-level certification and one of the highest-paying IT certifications globally.
The exam covers designing and planning a cloud solution architecture, managing and provisioning solution infrastructure, designing for security and compliance, analyzing and optimizing technical and business processes, managing implementation, and ensuring solution and operations reliability. Candidates must demonstrate deep knowledge of Google Cloud services including Compute Engine, GKE, Cloud Run, Cloud Functions, Cloud Storage, Cloud SQL, Cloud Spanner, BigQuery, VPC networking, Cloud Load Balancing, Cloud IAM, and many more.
This certification is recommended for professionals with at least three years of industry experience and one year of hands-on experience designing and managing solutions on Google Cloud. It validates the ability to leverage Google Cloud technologies to transform business requirements into technical architecture and implementation plans.
GCP-PCA Practice Exam 1
Comprehensive 50-question practice exam covering all six GCP Professional Cloud Architect domains: designing and planning cloud solution architecture, managing and provisioning infrastructure, security and compliance, analyzing and optimizing processes, managing implementation, and ensuring solution reliability.
GCP-PCA Practice Exam 2
Second comprehensive 50-question practice exam for the Google Cloud Professional Cloud Architect certification. Covers advanced scenarios in solution architecture design, infrastructure provisioning, security and compliance, process optimization, implementation management, and operations reliability across Google Cloud services.
GCP-PCA Practice Exam 3
Third comprehensive 50-question practice exam for Google Cloud Professional Cloud Architect. Features advanced case-study style questions covering solution design, infrastructure management, security architecture, process optimization, deployment strategies, and operational reliability across the full range of Google Cloud services.
GCP-PCA Practice Exam 4
Fourth comprehensive 50-question practice exam for Google Cloud Professional Cloud Architect. Emphasizes real-world case studies with complex multi-service architectures, migration scenarios, security hardening, and operational excellence patterns across Google Cloud Platform.
GCP-PCA Practice Exam 5
Fifth comprehensive 50-question practice exam for Google Cloud Professional Cloud Architect. Focuses on advanced integration patterns, enterprise migration strategies, multi-cloud governance, and production-grade reliability engineering across Google Cloud services.
GCP-PCA Practice Exam 6
Sixth and final comprehensive 50-question practice exam for Google Cloud Professional Cloud Architect. Covers the most challenging scenarios including complex case studies, multi-service integration patterns, enterprise-scale architecture decisions, and advanced operational strategies across all Google Cloud domains.
Teljes hozzáférés feloldása: GCP-PCA
6 Gyakorló teszt(ek) + Tanulókártyák — 3 hónapos hozzáférés
vagy benne van a Havi előfizetésben / Tartalomcsomagban
Előnézet (10 / 120)
Tanulókártyák
kártya a legfontosabb 120 fogalmakról GCP-PCA
vagy benne van a Havi előfizetésben / Tartalomcsomagban
110 további kártya érhető el feloldás után
Elérhető nyelvek
Vizsgatémák
GCP-PCA Cheat Sheet
Gyors összefoglaló - 6 szakasz
Google Cloud Professional Cloud Architect (GCP-PCA)
The Google Cloud Professional Cloud Architect certification validates your ability to design, develop, and manage robust, secure, scalable, highly available, and dynamic solutions on Google Cloud Platform. This is one of Google Cloud's most prestigious and sought-after certifications, consistently ranked among the top-paying IT certifications worldwide. Professional Cloud Architects leverage their understanding of cloud technologies and Google Cloud services to translate business requirements into cloud-native architectures that meet technical and operational needs. Candidates are expected to demonstrate proficiency in designing and planning cloud solution architectures, managing and provisioning cloud infrastructure using Infrastructure as Code and configuration management tools, designing for security and compliance, analyzing and optimizing technical and business processes, managing implementation across teams, and ensuring reliability of solutions and operations. The exam tests not only your knowledge of individual GCP services but also your ability to make architectural decisions in complex, real-world scenarios that may involve hybrid and multi-cloud environments, migration strategies, cost optimization, and organizational change management. Google frequently includes case studies in the exam, requiring you to analyze detailed business and technical requirements before recommending an appropriate architecture. Understanding the official case studies (such as EHR Healthcare, Helicopter Racing League, Mountkirk Games, and TerramEarth or their current equivalents) is strongly recommended for exam preparation.
Exam Details
| Exam Code | GCP-PCA (Professional Cloud Architect) |
| Duration | 120 minutes |
| Number of Questions | 50-60 questions (multiple choice and multiple select) |
| Passing Score | Not published (approximately 70-80%) |
| Cost | $200 USD |
| Validity | 2 years (must recertify before expiration) |
| Question Types | Multiple choice, multiple select, case study-based scenarios |
| Recommended Experience | 3+ years industry experience, 1+ year designing GCP solutions |
| Certification Level | Professional |
Domain Weights
| Domain | Weight |
|---|---|
| Designing and Planning a Cloud Solution Architecture | ~24% |
| Managing and Provisioning Cloud Solution Infrastructure | ~16% |
| Designing for Security and Compliance | ~20% |
| Analyzing and Optimizing Technical and Business Processes | ~16% |
| Managing Implementation | ~12% |
| Ensuring Solution and Operations Reliability | ~12% |
Study Tips
- Designing and Planning (~24%) is the largest domain; focus on understanding which GCP services to choose for specific requirements (latency, throughput, consistency, cost) and how to justify architectural trade-offs in scenario-based questions
- Case studies are a major component of the exam; thoroughly review the official practice case studies (EHR Healthcare, Helicopter Racing League, Mountkirk Games, TerramEarth) and practice mapping business requirements to GCP solutions before exam day
- Understand the GCP resource hierarchy deeply: Organization, Folders, Projects, and Resources; know how IAM policies are inherited down this hierarchy and how to structure projects for billing isolation, environment separation, and team autonomy
- Know the key differences between compute options: Compute Engine (IaaS VMs), GKE (managed Kubernetes), Cloud Run (serverless containers), App Engine (PaaS), and Cloud Functions (FaaS); each has specific use cases involving control, portability, scaling speed, and operational overhead trade-offs
- Data storage decisions are heavily tested; understand when to use Cloud SQL vs Cloud Spanner vs Bigtable vs Firestore vs BigQuery vs Memorystore and the consistency, latency, scalability, and cost implications of each choice
- Security and compliance questions require understanding VPC Service Controls, Cloud KMS, IAM best practices, organization policies, and the shared responsibility model in Google Cloud
- Practice with the Google Cloud Free Tier and Cloud Skills Boost (Qwiklabs) hands-on labs; real-world experience with deploying and managing GCP infrastructure is invaluable for scenario-based questions
Compute Services Comparison
| Service | Type | Best For |
|---|---|---|
| Compute Engine | IaaS (VMs) | Full OS control, legacy workloads, custom machine types, GPU/TPU workloads, lift-and-shift migrations; supports preemptible VMs (up to 80% cheaper, max 24h) and Spot VMs (no max duration, variable pricing); live migration keeps VMs running during host maintenance; sole-tenant nodes for compliance isolation; instance groups (managed/unmanaged) for autoscaling |
| Google Kubernetes Engine (GKE) | Managed Kubernetes | Containerized microservices, hybrid/multi-cloud portability (Anthos), complex orchestration; Autopilot mode manages nodes automatically and charges per pod resource request; Standard mode gives full node control; supports node pools with different machine types, node auto-provisioning, workload identity for secure pod-to-GCP service authentication, network policies, and binary authorization |
| Cloud Run | Serverless Containers | Stateless HTTP services and event-driven containers with automatic scaling to zero; no cluster management; built on Knative; supports any language or library that can run in a container; concurrency configurable per instance (up to 1000); maximum request timeout 60 minutes; integrates with Eventarc for event-driven architectures; minimum instances to reduce cold starts |
| App Engine | PaaS | Web applications and APIs with minimal infrastructure management; Standard environment (sandbox, fast scaling, language-specific runtimes, scales to zero, free daily quota) and Flexible environment (Docker containers, custom runtimes, SSH access, longer request timeouts, always at least one instance running); traffic splitting for A/B testing and canary deployments; versions and services model |
| Cloud Functions | FaaS (Serverless Functions) | Lightweight event-driven processing, webhooks, real-time file processing, IoT backends; 1st gen (HTTP and event triggers, limited concurrency) and 2nd gen (built on Cloud Run, higher concurrency, longer timeout up to 60 minutes, Eventarc triggers, traffic splitting); scales to zero; pay per invocation; maximum function size 100MB compressed |
Exam Tip: When choosing a compute service, consider the level of operational control needed, the scaling pattern (steady vs bursty vs event-driven), portability requirements (Kubernetes for multi-cloud via Anthos), cold start sensitivity, and cost model (sustained use discounts and committed use discounts for Compute Engine, per-second billing for Cloud Run). If the question mentions "serverless" and "containers," the answer is almost always Cloud Run.
Storage and Database Services
| Service | Type | Key Characteristics |
|---|---|---|
| Cloud Storage | Object Storage | Unlimited scalability for unstructured data; storage classes: Standard (hot, frequently accessed), Nearline (once/month access, 30-day min), Coldline (once/quarter, 90-day min), Archive (once/year, 365-day min); autoclass automatically transitions objects between classes based on access patterns; lifecycle policies for automated deletion and class transitions; signed URLs for time-limited access; Object Versioning for protection against accidental deletion; retention policies and bucket locks for compliance (WORM); dual-region and multi-region for high availability |
| Cloud SQL | Managed Relational DB | MySQL, PostgreSQL, SQL Server; up to 128 vCPUs and 864 GB RAM; automated backups, point-in-time recovery, high availability with regional failover (synchronous replication); read replicas (including cross-region) for read scaling; Private IP via VPC peering or Public IP with authorized networks; Cloud SQL Proxy for secure connections; maximum storage 64 TB; best for traditional OLTP workloads that do not require horizontal scaling beyond a single region |
| Cloud Spanner | Global Relational DB | Horizontally scalable, strongly consistent, globally distributed relational database; combines the benefits of relational structure with unlimited scale; supports SQL queries, schemas, ACID transactions across regions; TrueTime API for external consistency; automatic sharding; 99.999% SLA for multi-region configurations; ideal for financial systems, global gaming backends, inventory management that require strong consistency at scale; significantly more expensive than Cloud SQL |
| Firestore | Document Database | Serverless NoSQL document database; Native mode (real-time listeners, offline support, mobile/web SDKs, strongly consistent) and Datastore mode (backward compatible, server-side only, eventual consistency for some queries); automatic scaling, multi-region replication; hierarchical data model with collections and documents; composite indexes for complex queries; best for mobile/web apps, user profiles, game state, product catalogs |
| Cloud Bigtable | Wide-Column NoSQL | Petabyte-scale, low-latency (sub-10ms) NoSQL for large analytical and operational workloads; single-row key design is critical for performance (avoid hotspots); ideal for time-series data, IoT telemetry, financial tick data, ad-tech, and ML feature stores; integrates with Hadoop, Dataflow, and Dataproc; HBase API compatible; scales linearly by adding nodes; no multi-row transactions; replication for HA across clusters |
| BigQuery | Data Warehouse | Serverless, petabyte-scale analytics data warehouse; SQL interface; columnar storage with automatic optimization; separation of storage and compute; on-demand pricing (per TB scanned) or flat-rate (slots); BigQuery ML for in-database machine learning; federated queries against Cloud Storage, Cloud SQL, Bigtable without loading data; materialized views, partitioned tables (time, integer-range, ingestion-time), clustered tables for cost optimization; streaming inserts for real-time analytics; BI Engine for sub-second dashboard queries |
| Memorystore | In-Memory Cache | Managed Redis and Memcached; sub-millisecond latency for caching, session storage, leaderboards, real-time analytics; Redis supports persistence, replication, high availability with automatic failover; Memcached for simple caching with multi-threaded performance; VPC-native; use as a caching layer in front of Cloud SQL or Firestore to reduce database load and improve response times |
Exam Tip: The exam frequently tests your ability to choose the right database: Cloud SQL for traditional relational workloads under 64 TB; Cloud Spanner when you need global relational consistency at scale; Firestore for document-oriented mobile/web apps; Bigtable for high-throughput, low-latency key-value at petabyte scale; BigQuery for analytics and reporting (not OLTP). If the scenario mentions "global, strongly consistent, relational, and scalable," the answer is Cloud Spanner.
Networking Architecture
| Service | Description |
|---|---|
| VPC (Virtual Private Cloud) | Global resource spanning all regions; subnets are regional (span all zones in a region); auto mode VPC creates one subnet per region automatically; custom mode VPC gives full control over subnets and IP ranges; VPC peering connects two VPCs with private IP (non-transitive, no overlapping CIDR); Shared VPC allows multiple projects to use a common VPC managed by a host project; firewall rules are stateful and apply at the VPC level |
| Cloud Load Balancing | Global HTTP(S) LB (Layer 7, anycast IP, URL maps, SSL termination, Cloud CDN integration, Cloud Armor WAF); Regional External LB (TCP/UDP, network LB for non-HTTP traffic); Internal HTTP(S) LB (Layer 7 for internal microservices); Internal TCP/UDP LB (regional, for internal non-HTTP traffic); choose global for multi-region, regional for single-region; all support health checks and autoscaling integration |
| Cloud CDN | Content delivery network integrated with HTTP(S) Load Balancing; caches content at Google edge locations worldwide; cache modes: USE_ORIGIN_HEADERS, CACHE_ALL_STATIC, FORCE_CACHE_ALL; signed URLs and signed cookies for authenticated content delivery; cache invalidation via API or gcloud; reduces latency and offloads backend origin servers |
| Cloud Interconnect | Dedicated Interconnect (10/100 Gbps physical connection, lowest latency, highest bandwidth, requires colocation facility); Partner Interconnect (50 Mbps to 50 Gbps via service provider, no colocation needed); both provide private connectivity bypassing the public internet; use for large data transfer, low-latency hybrid workloads, and compliance requirements prohibiting public internet transit |
| Cloud VPN | HA VPN (99.99% SLA, two tunnels, BGP dynamic routing) and Classic VPN (99.9% SLA, static routing); encrypted IPsec tunnels over public internet; HA VPN requires two interfaces and two external IP addresses for redundancy; use when traffic volume does not justify the cost of Cloud Interconnect; supports site-to-site and hub-spoke topologies |
| Cloud DNS | Managed authoritative DNS service; 100% uptime SLA; supports public and private zones; DNSSEC for cryptographic validation; DNS peering for cross-VPC resolution; forwarding zones to route queries to on-premises DNS; Cloud Domains for domain registration |
Data Processing and Analytics
- Dataflow: Fully managed, serverless stream and batch data processing based on Apache Beam; unified programming model for ETL, data enrichment, and real-time analytics; autoscaling workers; exactly-once processing semantics; integrates with Pub/Sub for streaming ingestion and BigQuery for output; use when you need a managed, autoscaling pipeline without cluster management
- Dataproc: Managed Apache Spark and Hadoop clusters; create clusters in 90 seconds; use for existing Spark/Hadoop workloads, data science with PySpark, and batch ETL; preemptible worker nodes for cost savings; autoscaling policies; connectors for BigQuery, Cloud Storage, and Bigtable; choose over Dataflow when you have existing Spark/Hadoop code or need the broader Hadoop ecosystem
- Pub/Sub: Global, real-time messaging and event streaming service; at-least-once delivery; push and pull subscriptions; message retention up to 31 days; dead letter topics for failed messages; ordering keys for ordered delivery within a key; Pub/Sub Lite for high-volume, single-zone cost optimization; use as the ingestion layer for streaming architectures feeding into Dataflow or BigQuery
- Data Fusion: Fully managed, code-free data integration service built on open-source CDAP; visual drag-and-drop ETL/ELT pipeline builder; 150+ pre-built connectors; batch and real-time pipelines; use for enterprise data integration when non-developers need to build pipelines
- Composer: Managed Apache Airflow for workflow orchestration; schedule and monitor complex multi-step data pipelines; DAGs (Directed Acyclic Graphs) define task dependencies; integrates with Dataflow, Dataproc, BigQuery, and Cloud Functions; use for orchestrating multi-service ETL workflows
GCP Resource Hierarchy
| Level | Description |
|---|---|
| Organization | Root node of the hierarchy, automatically created when a Google Workspace or Cloud Identity account is associated with GCP; provides centralized visibility and control over all GCP resources; Organization Administrator role manages organization-level IAM policies; organization policies (constraints) enforce governance rules across all projects and folders (e.g., restrict resource locations, disable external IP on VMs, restrict allowed services) |
| Folders | Grouping mechanism for projects; can be nested up to 10 levels deep; use to mirror organizational structure (departments, teams, environments); IAM policies on folders are inherited by all child folders and projects; common patterns: top-level folders by department (Engineering, Finance), sub-folders by environment (Dev, Staging, Prod); enables delegated administration where each department manages its own folder |
| Projects | Core organizational unit for all GCP resources; every resource belongs to exactly one project; projects provide billing boundaries (each project linked to one billing account), IAM boundaries, API enablement scope, and quota management; project ID is globally unique and immutable; project number is auto-assigned; use separate projects for different applications, environments, or teams to enforce isolation |
| Resources | Individual GCP services and objects (VMs, buckets, datasets, etc.); some resources are global (VPC networks, images), regional (subnets, static IPs, managed instance groups), or zonal (VM instances, persistent disks); resource location affects availability, latency, and cost; IAM can be set at any level and is inherited downward |
Exam Tip: IAM policies are inherited from parent to child: Organization -> Folder -> Project -> Resource. A policy set at the organization level applies to every resource. Effective policy is the union of inherited and directly assigned policies. You cannot restrict a parent-level permission at a lower level (additive only). Use organization policy constraints (not IAM) to restrict actions like disabling external IPs or limiting resource locations.
Infrastructure as Code (IaC)
| Tool | Description |
|---|---|
| Deployment Manager | Google-native IaC service; YAML or Jinja2/Python templates define resources; supports type providers for custom resource types; preview mode shows changes before deployment; parallel resource creation; limitations: GCP-only, smaller community than Terraform, being superseded by newer approaches; suitable for pure-GCP environments requiring a native toolset |
| Terraform | HashiCorp's multi-cloud IaC tool; HCL declarative language; Google Cloud provider covers all GCP services; state management (local, GCS backend, Terraform Cloud); plan/apply workflow for safe changes; modules for reusable components; widely adopted for multi-cloud and hybrid environments; Google provides official Terraform modules and blueprints; preferred for organizations using multiple cloud providers |
| Config Connector | Kubernetes add-on that manages GCP resources through Kubernetes Resource Model (KRM); define GCP resources as Kubernetes custom resources in YAML; reconciliation loop ensures desired state matches actual state; ideal for teams already managing workloads with Kubernetes who want a unified workflow for both application and infrastructure management |
Managed Instance Groups and Autoscaling
| Feature | Description |
|---|---|
| Managed Instance Group (MIG) | Group of identical VM instances created from an instance template; supports autoscaling, auto-healing (health check-based recreation), rolling updates, and canary deployments; regional MIG distributes instances across multiple zones for high availability; stateful MIG preserves per-instance state (disks, metadata, IPs) during updates and recreation |
| Autoscaling Policies | Scale based on CPU utilization, load balancing capacity, Cloud Monitoring metrics, or schedules; predictive autoscaling uses ML to anticipate traffic spikes; cool-down period prevents rapid scale-in/out oscillation; minimum and maximum instance limits; scale-in controls to prevent aggressive scaling down; multiple autoscaling signals can be combined (highest recommended size wins) |
| Instance Templates | Immutable blueprint defining machine type, boot disk image, network tags, service account, startup script, metadata, and labels; used by MIGs to create new instances; update by creating a new template and performing a rolling update on the MIG; regional or global scope; use custom images with pre-installed software for faster startup times |
| Health Checks | HTTP, HTTPS, TCP, SSL, or gRPC probes to verify instance health; used by load balancers (for traffic routing) and MIGs (for auto-healing); configure check interval, timeout, healthy/unhealthy thresholds; auto-healing recreates instances that fail health checks; initial delay parameter allows instances time to boot before being checked |
Migration Strategies
- Lift and Shift (Rehost): Move workloads as-is to Compute Engine VMs with minimal changes; Migrate to Virtual Machines (formerly Migrate for Compute Engine) provides automated VM migration from on-premises VMware, AWS, or Azure; fastest migration path but does not leverage cloud-native benefits; suitable for workloads that cannot be easily refactored or as a first step before modernization
- Move and Improve (Replatform): Make targeted changes during migration to take advantage of managed services; examples: move a MySQL database to Cloud SQL, containerize applications for GKE, move file storage to Cloud Storage; moderate effort with meaningful benefits in operational overhead reduction and scalability improvement
- Rebuild (Refactor/Re-architect): Redesign applications as cloud-native using microservices, serverless, and managed services; highest effort but greatest long-term benefits; use Cloud Run, Firestore, Pub/Sub, and other fully managed services; adopt event-driven architectures and 12-factor app principles; recommended for strategic applications that justify the investment
- Database Migration Service (DMS): Managed service for migrating databases to Cloud SQL and AlloyDB; supports MySQL, PostgreSQL, SQL Server, and Oracle sources; continuous replication for minimal downtime migration; heterogeneous migrations (e.g., Oracle to PostgreSQL) require additional schema conversion
- Transfer Service for Cloud Storage: Managed service for large-scale data transfers from on-premises, AWS S3, Azure Blob, HTTP/HTTPS sources, or between Cloud Storage buckets; scheduled transfers; bandwidth management; for extremely large datasets (petabytes), use Transfer Appliance (physical device shipped to your datacenter)
Exam Tip: Migration questions often test your ability to choose the right strategy based on the organization's timeline, budget, risk tolerance, and long-term goals. A time-constrained migration with minimal downtime usually calls for lift-and-shift, while a strategic initiative to improve scalability and reduce operational costs calls for refactoring. Know the migration path: Assess (discovery and planning) -> Plan (prioritize workloads) -> Deploy (execute migration) -> Optimize (right-size and improve).
Identity and Access Management (IAM)
| Concept | Description |
|---|---|
| IAM Policy Model | Binds members (who) to roles (what permissions) at a resource (where); members can be Google accounts, service accounts, Google Groups, Google Workspace domains, or Cloud Identity domains; allUsers (public) and allAuthenticatedUsers (any Google account) are special identifiers; IAM policies are additive (union of all inherited and directly bound roles); IAM conditions allow granting access based on attributes like resource type, resource name, date/time, IP address, and request attributes |
| Role Types | Basic roles (Owner, Editor, Viewer) grant broad access and should be avoided in production; Predefined roles (e.g., roles/compute.instanceAdmin, roles/storage.objectViewer) are service-specific and follow least privilege; Custom roles allow you to select individual permissions to create organization or project-scoped roles; always prefer predefined roles; use custom roles only when predefined roles are too broad |
| Service Accounts | Non-human identities for applications and services; default service accounts (auto-created, overly permissive with Editor role, avoid using in production); user-managed service accounts (create with specific permissions following least privilege); service account keys (JSON key files, avoid when possible due to key management burden); short-lived credentials using impersonation or workload identity federation are preferred; attach service accounts to resources (VMs, Cloud Functions, GKE pods) instead of using key files |
| Workload Identity Federation | Allows external workloads (AWS, Azure, on-premises, GitHub Actions, Kubernetes) to access GCP resources without service account keys; configures identity pools and providers to trust external identity tokens; maps external identities to GCP principals; eliminates the need to export, store, and rotate long-lived service account keys; critical for multi-cloud and CI/CD security |
| Workload Identity (GKE) | Recommended way for GKE pods to authenticate to GCP services; maps Kubernetes service accounts to GCP service accounts; pods use the mapped GCP service account identity to call GCP APIs; eliminates the need to store service account keys in Kubernetes secrets; enabled at the cluster level with per-namespace configuration |
Exam Tip: Always apply the principle of least privilege. Never use basic roles (Owner/Editor/Viewer) in production. Prefer predefined roles over custom roles. Use Google Groups for role assignment to simplify management at scale. For service accounts, avoid creating and downloading key files; instead use attached service accounts, workload identity, or impersonation to obtain short-lived tokens.
Network Security
| Control | Description |
|---|---|
| VPC Firewall Rules | Stateful; apply at the VPC level; defined by direction (ingress/egress), priority (0-65535, lower = higher), target (all instances, network tags, or service accounts), source/destination (IP ranges, network tags, service accounts), protocol/port, and action (allow/deny); default rules: allow all egress, deny all ingress, allow internal traffic, allow ICMP/SSH/RDP from GCP; use network tags or service accounts as targets for fine-grained control |
| Hierarchical Firewall Policies | Defined at the organization or folder level; inherited by child folders and projects; evaluated before VPC firewall rules; enables centralized security teams to enforce baseline rules (e.g., block known malicious IPs, allow security scanner access) that project owners cannot override; actions: ALLOW, DENY, GOTO_NEXT (delegate decision to VPC firewall rules) |
| VPC Service Controls | Create security perimeters around GCP resources to prevent data exfiltration; define service perimeters that restrict which projects can access which services; access levels based on IP address, device attributes, and user identity; bridges for controlled communication between perimeters; dry-run mode for testing before enforcement; critical for protecting sensitive data in BigQuery, Cloud Storage, and Pub/Sub from unauthorized access even by compromised identities |
| Cloud Armor | DDoS protection and WAF for HTTP(S) Load Balancing; preconfigured WAF rules for OWASP Top 10 (SQL injection, XSS, RCE); custom rules using CEL (Common Expression Language) for IP allowlisting/denylisting, geographic restrictions, and rate limiting; adaptive protection uses ML to detect and mitigate L7 DDoS attacks; edge security policies for Cloud CDN; bot management capabilities |
| Private Google Access | Allows VM instances with only internal IPs (no external IPs) to reach Google APIs and services (Cloud Storage, BigQuery, etc.) through Google's internal network instead of the public internet; enabled per subnet; Private Service Connect provides a private endpoint with a private IP address for Google APIs within your VPC; essential for security-sensitive environments that prohibit public internet access |
Data Protection and Encryption
| Encryption Method | Description |
|---|---|
| Google Default Encryption | All data at rest is encrypted by default using AES-256; Google manages the entire key hierarchy (data encryption keys, key encryption keys, KMS master keys); no configuration required; data is automatically encrypted before being written to disk and decrypted when read by authorized processes; encryption in transit is also enabled by default between Google services and from user to Google using TLS |
| Cloud KMS (CMEK) | Customer-Managed Encryption Keys; you create and control the key lifecycle (creation, rotation, disabling, destruction) while Google handles the encryption/decryption operations; keys stored in Cloud KMS (software or HSM backed); key rings organize keys by region; automatic key rotation on configurable schedules; integrates with Cloud Storage, BigQuery, Compute Engine disks, Cloud SQL, Pub/Sub, and many more services; provides audit trail via Cloud Audit Logs for every key use |
| Cloud HSM (CMEK with HSM) | Hardware Security Module-backed keys in Cloud KMS; FIPS 140-2 Level 3 certified; keys are generated and stored within the HSM and never leave it in plaintext; same Cloud KMS API and integration; required for regulatory compliance mandating hardware-protected keys; higher cost than software-backed KMS keys |
| Cloud EKM | Cloud External Key Manager; encryption keys are managed by a supported external key management partner (Thales, Fortanix, etc.) outside of Google's infrastructure; Google calls the external key manager for every encryption/decryption operation; provides separation of key management from cloud provider; Key Access Justifications shows the reason for each key access request, allowing you to approve or deny |
| CSEK (Customer-Supplied) | Customer-Supplied Encryption Keys; you provide the raw encryption key with each API call; Google uses the key for encryption/decryption but does not store it; available for Compute Engine disks and Cloud Storage objects; provides maximum control but requires you to manage key storage and availability; if you lose the key, Google cannot recover your data |
Exam Tip: The encryption hierarchy for the exam: Google default (no effort, Google manages everything) -> CMEK with Cloud KMS (you manage key lifecycle, Google manages encryption operations) -> Cloud HSM (hardware-backed CMEK for compliance) -> Cloud EKM (keys outside Google, maximum separation) -> CSEK (you supply and manage the raw key). Most exam scenarios requiring customer control over encryption keys are solved by CMEK with Cloud KMS. Use CSEK only when keys must never be stored on Google infrastructure even temporarily.
Compliance and Security Services
- Organization Policies: Constraints enforced at the organization, folder, or project level; examples: constraints/compute.disableSerialPortAccess, constraints/compute.vmExternalIpAccess (restrict VMs from having external IPs), constraints/iam.disableServiceAccountKeyCreation, constraints/gcp.resourceLocations (restrict where resources can be created); boolean constraints (enabled/disabled) and list constraints (allowed/denied values); override inheritance for specific folders or projects
- Security Command Center (SCC): Centralized security and risk management platform; Standard tier (free: asset discovery, security misconfigurations, web security scanning) and Premium tier (vulnerability findings from Web Security Scanner, Event Threat Detection, Container Threat Detection, Virtual Machine Threat Detection, Security Health Analytics findings for compliance benchmarks like CIS, PCI DSS, NIST 800-53); findings can trigger Cloud Functions or Pub/Sub notifications for automated remediation
- Cloud Audit Logs: Four types: Admin Activity (always on, free, who did what, where, when for administrative actions), Data Access (must be enabled, records data read/write operations, can generate high volume), System Event (Google-initiated system events like live migration), Policy Denied (logs access denials from VPC Service Controls and Organization Policies); critical for compliance, forensics, and security monitoring; integrate with Cloud Monitoring and SIEM tools
- DLP API (Sensitive Data Protection): Inspect, classify, and de-identify sensitive data across GCP and on-premises; detects PII (names, emails, SSNs, credit card numbers), healthcare data (PHI), and custom patterns; de-identification methods include masking, tokenization, bucketing, date shifting, and format-preserving encryption; scan Cloud Storage, BigQuery, and Datastore; use for GDPR, HIPAA, and PCI DSS compliance
- Assured Workloads: Automates compliance setup for regulated workloads; supports FedRAMP, HIPAA, PCI DSS, CJIS, IL4/IL5, and regional controls; configures necessary organization policies, data residency, encryption, and personnel access controls in a dedicated folder; simplifies achieving and maintaining compliance in GCP
Cost Optimization Strategies
| Strategy | Description |
|---|---|
| Committed Use Discounts (CUD) | 1-year (up to 37% discount) or 3-year (up to 55% discount) commitments for Compute Engine, Cloud SQL, Cloud Run, and GKE; resource-based CUDs commit to specific vCPU and memory amounts in a region; spend-based CUDs for Cloud SQL and certain other services commit to a dollar amount; automatically apply to matching usage; cannot be canceled once purchased; analyze usage patterns with billing reports and recommender before committing |
| Sustained Use Discounts (SUD) | Automatic discounts for Compute Engine VMs running more than 25% of the month; up to 30% discount for instances running the entire month; applied automatically at the billing account level; applies to vCPUs and memory independently; does not apply to E2, A2, or sole-tenant VMs; do not require any commitment or upfront payment; combined with CUDs for maximum savings |
| Preemptible and Spot VMs | Up to 60-91% discount versus on-demand pricing; Spot VMs (newer, no maximum runtime) and Preemptible VMs (legacy, 24-hour max); can be reclaimed with 30-second notice when capacity is needed; ideal for fault-tolerant batch processing, data analytics, CI/CD builds, rendering, and distributed workloads; always design with graceful shutdown handling and checkpointing |
| Right-sizing Recommendations | Google Cloud Recommender analyzes VM utilization metrics and suggests downsizing or upsizing machine types; custom machine types allow specifying exact vCPU and memory amounts to avoid paying for unused resources; Compute Engine provides idle VM recommendations for instances with consistently low utilization; review recommendations regularly in the Cloud Console or via API |
| Storage Cost Optimization | Use appropriate storage classes (Nearline/Coldline/Archive for infrequently accessed data); enable Autoclass for automatic class transitions; lifecycle policies to delete or transition objects; BigQuery partitioning and clustering reduce query costs by scanning less data; set BigQuery table expiration for temporary datasets; use BigQuery BI Engine for cached analytics instead of repeated full-table scans |
Exam Tip: Know the cost optimization hierarchy: first right-size resources, then use sustained use discounts (automatic), then consider committed use discounts for predictable workloads, and use Spot/preemptible VMs for fault-tolerant batch workloads. For BigQuery, always partition and cluster tables, and use on-demand pricing for unpredictable workloads or flat-rate (editions) for predictable, high-volume analytics teams.
CI/CD and DevOps on GCP
| Service | Description |
|---|---|
| Cloud Build | Serverless CI/CD platform; build, test, and deploy using YAML or JSON configuration files (cloudbuild.yaml); supports Docker builds, custom build steps, and parallel execution; triggers from Cloud Source Repositories, GitHub, or Bitbucket; integrates with Artifact Registry for storing build artifacts; build approvals for production deployments; vulnerability scanning of container images during build |
| Cloud Deploy | Managed continuous delivery service for GKE and Cloud Run; define delivery pipelines with sequential targets (dev -> staging -> prod); automated rollbacks on failure; canary and blue/green deployment strategies; approval gates between stages; audit trail of all deployments; integrates with Cloud Build for artifact creation |
| Artifact Registry | Universal package manager for Docker images, Maven, npm, Python, Go, and more; regional and multi-regional repositories; vulnerability scanning for container images and language packages; IAM-based access control; replaces Container Registry (deprecated); integrates with Cloud Build, GKE, Cloud Run, and Cloud Functions |
| Binary Authorization | Deploy-time security control for GKE and Cloud Run; ensures only trusted container images are deployed; policy defines which attestors must sign an image before deployment; attestations are cryptographic signatures created during the CI/CD pipeline (e.g., after vulnerability scanning passes); prevents deployment of unsigned, unscanned, or untrusted images; critical for supply chain security |
Monitoring, Logging, and Observability
| Service | Description |
|---|---|
| Cloud Monitoring | Metrics collection and visualization for GCP and hybrid resources; built-in metrics for all GCP services; custom metrics via API or OpenTelemetry; alerting policies with notification channels (email, SMS, PagerDuty, Slack, webhooks, Pub/Sub); uptime checks for external endpoint monitoring; dashboards for visualizing metrics; Managed Service for Prometheus for Kubernetes metrics at scale |
| Cloud Logging | Centralized log management; automatically collects logs from GCP services; log sinks route logs to Cloud Storage (archival), BigQuery (analytics), Pub/Sub (streaming), or Splunk; log-based metrics create custom metrics from log entries; logs explorer for searching and filtering; retention: 30 days for admin logs (free), 30 days for data access logs, configurable for custom retention; log buckets for organizing and controlling access to logs |
| Cloud Trace | Distributed tracing system for understanding application latency; automatically traces requests across GCP services; compatible with OpenTelemetry and Zipkin; latency reports identify slow services and bottlenecks; essential for debugging microservice architectures where a single request traverses multiple services |
| Error Reporting | Aggregates and displays errors from cloud services and applications; automatic error grouping and deduplication; real-time notifications for new errors; stack trace analysis; tracks error frequency and resolution status; supports App Engine, Cloud Functions, Cloud Run, GKE, and Compute Engine with the Logging agent |
Hybrid and Multi-Cloud with Anthos
- Anthos Overview: Google's platform for managing applications across GCP, on-premises, and other clouds (AWS, Azure); unified Kubernetes management via fleet concept; consistent security policies and configuration management; centralized observability with Cloud Monitoring and Cloud Logging across all clusters
- Anthos on GKE: GKE clusters registered with Anthos fleet; fleet-level policies apply uniformly; Multi Cluster Ingress for global load balancing across clusters; centralized logging and monitoring via Connect gateway
- Anthos on Bare Metal/VMware: Run Kubernetes on existing on-premises infrastructure; minimal footprint; integrates with existing networking and storage; Anthos Service Mesh for traffic management and security between services
- Anthos Service Mesh: Managed Istio-based service mesh; mutual TLS (mTLS) between services without code changes; traffic management (canary deployments, traffic splitting, fault injection); observability (metrics, traces, topology graphs); works across GKE, on-premises, and multi-cloud clusters
- Config Sync: GitOps-based configuration management for Anthos; sync Kubernetes manifests and policies from a Git repository to all registered clusters; namespaced and cluster-scoped configurations; drift detection and automatic remediation; ensures consistency across environments
High Availability and Disaster Recovery
| Concept | Description |
|---|---|
| Zones, Regions, and Multi-Region | Zone: single failure domain within a region (independent power, cooling, networking); Region: geographic area with 3+ zones (e.g., us-central1); Multi-region: multiple regions for global services; deploy across multiple zones for HA (99.99% for regional resources); deploy across regions for DR (RPO/RTO requirements); global services like Cloud Spanner and Cloud Storage multi-region provide cross-region redundancy automatically |
| DR Patterns | Cold standby (lowest cost, highest RTO: backup data to another region, rebuild infrastructure when needed); Warm standby (moderate cost, moderate RTO: reduced-capacity infrastructure running in DR region, scale up during failover); Hot standby (highest cost, lowest RTO/RPO: full-capacity infrastructure in DR region with continuous replication, immediate failover); choose based on RPO/RTO requirements and budget; document and test DR runbooks regularly |
| Cloud SQL HA | Regional HA configuration creates a primary instance and standby in a different zone within the same region; synchronous replication ensures zero data loss; automatic failover in ~60 seconds; cross-region read replicas for DR (promote replica to primary during regional failure); automated backups with configurable retention (up to 365 days) and point-in-time recovery |
| GKE High Availability | Regional clusters distribute control plane and nodes across multiple zones; multi-zone node pools; Pod Disruption Budgets ensure minimum available replicas during maintenance; anti-affinity rules spread pods across nodes and zones; horizontal pod autoscaler scales based on CPU/memory/custom metrics; cluster autoscaler adds/removes nodes; maintenance windows control when updates occur |
Exam Tip: Know the SLA implications: a single-zone deployment provides no protection against zone failure. Regional deployments protect against zone failures. Multi-region deployments protect against region failures but add complexity and cost. The exam often asks you to design for a specific RTO/RPO target; map the requirement to the appropriate DR pattern (cold/warm/hot) and GCP services that support it.
SRE Principles on GCP
| Concept | Description |
|---|---|
| SLIs (Service Level Indicators) | Quantitative measures of service behavior: availability (percentage of successful requests), latency (percentage of requests served within a threshold), throughput (requests per second), error rate (percentage of failed requests); measured from the user's perspective; choose SLIs that reflect what users actually care about; implemented with Cloud Monitoring metrics |
| SLOs (Service Level Objectives) | Target values for SLIs (e.g., 99.9% of requests complete in under 200ms); set based on user expectations and business requirements; not 100% because that is unrealistic and prevents innovation; Cloud Monitoring SLO monitoring tracks actual performance against targets; burn rate alerts notify when error budget is being consumed too quickly |
| Error Budgets | The acceptable amount of unreliability derived from SLOs (e.g., 99.9% SLO = 0.1% error budget = ~43 minutes of downtime per month); when error budget is remaining, teams can push features and take risks; when error budget is exhausted, focus shifts to reliability improvements; creates a data-driven balance between velocity and stability; error budget policies define actions when budget is depleted (freeze deployments, prioritize reliability work) |
| Toil Reduction | Toil is manual, repetitive, automatable operational work that scales linearly with service size and provides no lasting value; SRE goal is to keep toil below 50% of team time; automate with Cloud Build, Deployment Manager/Terraform, Cloud Functions for event-driven remediation, and Policy-as-Code; examples: automate certificate rotation, capacity provisioning, incident response, and alert triage |
GCP Services Quick Reference
| Scenario | GCP Solution |
|---|---|
| Global relational database with strong consistency | Cloud Spanner (multi-region configuration) |
| Serverless stateless HTTP microservice | Cloud Run |
| Managed Kubernetes with multi-cloud portability | GKE + Anthos |
| Petabyte-scale low-latency time-series data | Cloud Bigtable |
| Serverless analytics and data warehouse | BigQuery |
| Real-time event streaming and messaging | Pub/Sub + Dataflow |
| Managed batch processing with Spark/Hadoop | Dataproc (ephemeral clusters on Cloud Storage) |
| Serverless stream and batch ETL pipelines | Dataflow (Apache Beam) |
| Prevent data exfiltration from GCP services | VPC Service Controls |
| Customer-managed encryption keys | Cloud KMS (CMEK) |
| Detect sensitive PII data in storage | Sensitive Data Protection (Cloud DLP API) |
| Secure pods-to-GCP-service authentication in GKE | Workload Identity |
| Centralized network for multiple projects | Shared VPC (host project + service projects) |
| Private, high-bandwidth hybrid connectivity | Dedicated Interconnect (10/100 Gbps) |
| DDoS protection and WAF for web apps | Cloud Armor + HTTP(S) Load Balancing |
| Mobile/web app with real-time sync | Firestore (Native mode) |
| Workflow orchestration for data pipelines | Cloud Composer (managed Apache Airflow) |
| Enforce governance policies across org | Organization Policy constraints + folders |
| Centralized security posture management | Security Command Center (Premium) |
| Lift-and-shift VM migration to GCP | Migrate to Virtual Machines |
Key Architectural Principles for the Exam
- Prefer Managed Services: Always choose the most managed option that meets the requirements; Cloud Run over GKE when Kubernetes features are not needed; Cloud SQL over self-managed MySQL on Compute Engine; BigQuery over self-managed data warehouse; managed services reduce operational toil and let Google handle scaling, patching, and availability
- Design for Failure: Assume any component can fail; use regional managed instance groups, multi-zone GKE clusters, and regional Cloud SQL HA; implement circuit breakers and retry logic with exponential backoff; use dead letter queues for Pub/Sub; store state externally (Cloud Storage, databases) not on local VM disks
- Separate Concerns: Use microservices architecture where appropriate; separate compute from storage (stateless instances + Cloud Storage/databases); separate data processing from serving (Dataflow/Dataproc for ETL, Cloud Run/GKE for serving); use Pub/Sub to decouple producers from consumers for asynchronous processing
- Automate Everything: Infrastructure as Code (Terraform or Deployment Manager); CI/CD pipelines (Cloud Build + Cloud Deploy); automated testing; automated scaling (managed instance group autoscaler, GKE cluster autoscaler, Cloud Run automatic scaling); automated security (Binary Authorization, Organization Policies, Security Command Center automated remediation)
- Security by Default: Encrypt data at rest and in transit (default in GCP); use IAM with least privilege (no basic roles in production); enable VPC Service Controls for sensitive data; use private connectivity (Private Google Access, Private Service Connect) instead of public endpoints; manage secrets with Secret Manager, not environment variables; enable Cloud Audit Logs for all critical services
- Optimize Costs Continuously: Right-size VMs using Recommender; use Spot/preemptible VMs for fault-tolerant workloads; commit to CUDs for stable workloads; use appropriate storage classes; partition and cluster BigQuery tables; shut down non-production resources outside business hours; label all resources for cost attribution; export billing to BigQuery for detailed analysis
Exam Tip: When in doubt on the exam, choose the answer that uses the most managed service, applies least privilege, designs for high availability across zones or regions, and follows Google Cloud best practices over custom-built solutions. Case study questions reward answers that address business requirements (cost, timeline, compliance) alongside technical requirements (performance, scalability, reliability). Always consider the trade-offs: cost vs availability, consistency vs latency, control vs operational overhead.