GKE supports scaling at multiple levels: individual pods, node pools, and the cluster itself. Understanding when and how to scale each layer is key to balancing performance and cost.
Scaling Levels
flowchart TB subgraph "Layer 1: Pod Scaling" HPA["Horizontal Pod Autoscaler\n(more/fewer pods)"] VPA["Vertical Pod Autoscaler\n(bigger/smaller pods)"] end subgraph "Layer 2: Node Pool Scaling" CA["Cluster Autoscaler\n(more/fewer nodes in a pool)"] end subgraph "Layer 3: Manual" M1["kubectl scale"] M2["gcloud resize"] end HPA -->|"not enough resources"| CA VPA -->|"pod needs more CPU/memory"| M1
| Scaling Type | Scope | Trigger | Automation |
|---|---|---|---|
| Manual | Pod replicas or node count | Human decision | No |
| HPA | Pod count (horizontal) | CPU, memory, custom metrics | Yes |
| VPA | Pod resource requests (vertical) | Historical usage patterns | Yes |
| Cluster Autoscaler | Node count in a pool | Pending pods / idle nodes | Yes |
Manual Scaling
Scaling Pods
# Scale a deployment to 5 replicas
kubectl scale deployment my-app --replicas=5
# Check current scale
kubectl get deployment my-appScaling Node Pools
# Resize a node pool
gcloud container clusters resize my-cluster \
--node-pool=default-pool \
--zone=us-central1-a \
--num-nodes=5
# Resize an Autopilot cluster (not applicable — nodes auto-scale)Note: Manual scaling is fine for predictable load patterns. For variable workloads, use autoscaling.
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pod replicas based on observed metrics.
How HPA Works
flowchart LR Metrics["Metrics Server\n(CPU, Memory, Custom)"] --> HPA["HPA Controller"] HPA -->|"current > target"| ScaleUp["Scale Up\n(+ replicas)"] HPA -->|"current < target"| ScaleDown["Scale Down\n(- replicas)"] HPA -->|"current ≈ target"| NoChange["No Change"] ScaleUp --> Deploy["Deployment"] ScaleDown --> Deploy
HPA YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up when memory > 80%HPA with Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # 100 requests/sec per podHPA Commands
# Create HPA imperatively (CPU-based)
kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10
# Apply HPA from YAML
kubectl apply -f hpa.yaml
# Check HPA status
kubectl get hpa
# Detailed HPA info
kubectl describe hpa my-app-hpaHPA Scaling Formula
desiredReplicas = ceil[currentReplicas × (currentMetricValue / desiredMetricValue)]Example: Current = 3 replicas, CPU utilization = 90%, target = 70%
desiredReplicas = ceil[3 × (90 / 70)] = ceil[3.86] = 4 replicasKey Insight: HPA requires the Metrics Server to be running. GKE clusters have it enabled by default. Verify with
kubectl top pods.
HPA Behavior Settings
Control how fast HPA scales up and down:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
minReplicas: 2
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up again
policies:
- type: Percent
value: 100 # Double replicas at most
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove 1 pod at a time
periodSeconds: 120Vertical Pod Autoscaler (VPA)
VPA adjusts pod CPU and memory requests based on historical and current usage. Unlike HPA (which adds/removes pods), VPA makes pods bigger or smaller.
Warning: VPA in
automode evicts and recreates pods to apply new resource settings. This causes temporary disruption. UseupdateMode: "Off"to get recommendations without enforcement.
VPA YAML
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Options: Off, Initial, Recreate, Auto
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "4Gi"VPA Update Modes
| Mode | Behavior | Use Case |
|---|---|---|
Off | Only provides recommendations, no changes | Planning and analysis |
Initial | Sets resources on pod creation only | Gradual adoption |
Recreate | Evicts and recreates pods with new settings | Stateless workloads |
Auto | Same as Recreate (currently) | Most workloads |
Viewing VPA Recommendations
# Enable VPA in GKE
gcloud container clusters update my-cluster \
--zone=us-central1-a \
--enable-vertical-pod-autoscaling
# Check VPA recommendations
kubectl describe vpa my-app-vpaOutput includes:
Recommendation:
Container Recommendations:
Container Name: app
Lower Bound:
Cpu: 100m
Memory: 128Mi
Target: # Recommended values
Cpu: 250m
Memory: 256Mi
Uncapped Target:
Cpu: 230m
Memory: 245Mi
Upper Bound:
Cpu: 1
Memory: 1GiCluster Autoscaler
Cluster Autoscaler adjusts the number of nodes in a node pool based on pod scheduling needs:
- Scale up: When pods are pending due to insufficient resources
- Scale down: When nodes are underutilized for a period
sequenceDiagram participant Deploy as Deployment participant Sched as Scheduler participant CA as Cluster Autoscaler participant NP as Node Pool Deploy->>Sched: Create 5 new pods Sched->>Sched: Not enough node capacity Note over Sched: Pods in Pending state Sched->>CA: Report pending pods CA->>NP: Add 2 nodes NP-->>Sched: New nodes ready Sched->>Deploy: Schedule pending pods
Enabling Cluster Autoscaler
# Enable on a Standard cluster (per node pool)
gcloud container clusters update my-cluster \
--zone=us-central1-a \
--enable-autoscaling \
--node-pool=default-pool \
--min-nodes=1 \
--max-nodes=10
# Or during cluster creation
gcloud container clusters create my-cluster \
--zone=us-central1-a \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--num-nodes=3Note: Autopilot clusters have built-in autoscaling. You don’t need to configure Cluster Autoscaler for Autopilot.
Cluster Autoscaler Configuration
| Parameter | Purpose | Recommended |
|---|---|---|
--min-nodes | Minimum nodes per zone | At least 1 for production |
--max-nodes | Maximum nodes per zone | Set a budget-appropriate ceiling |
--total-nodes (Autopilot) | Total node limit across all zones | Default: 70 (soft), can request increase |
Scale-Down Behavior
Cluster Autoscaler won’t scale down a node if:
- A pod on the node has a
PodDisruptionBudgetthat would be violated - A pod is not managed by a controller (standalone pod)
- A pod has a local
EmptyDirvolume - The node has the annotation
"cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"
HPA vs VPA vs Cluster Autoscaler
| Aspect | HPA | VPA | Cluster Autoscaler |
|---|---|---|---|
| What scales | Pod count | Pod size (CPU/memory) | Node count |
| Direction | Horizontal (more/fewer pods) | Vertical (bigger/smaller pods) | Infrastructure |
| Trigger | Current metrics vs target | Historical usage analysis | Pending pods / idle nodes |
| Disruption | None (add/remove pods) | Pod eviction (in Auto mode) | Pod eviction (node removal) |
| Best combined with | Cluster Autoscaler | HPA (not on same metric) | HPA |
Warning: Do not use HPA and VPA on the same metric (e.g., both on CPU). They can conflict — HPA scales out while VPA scales up, causing instability.
Recommended Combinations
| Workload Type | Scaling Strategy |
|---|---|
| Web applications | HPA (CPU) + Cluster Autoscaler |
| Batch processing | HPA (queue depth) + Spot node pools |
| Databases (StatefulSets) | VPA (right-sizing) + manual node scaling |
| Memory-heavy apps | HPA (memory metric) + Cluster Autoscaler |
| Unpredictable traffic | HPA (CPU) + VPA (Off mode for recommendations) |
Useful Commands
| Command | Purpose |
|---|---|
kubectl top pods | View pod CPU/memory usage |
kubectl top nodes | View node CPU/memory usage |
kubectl get hpa | List HPA resources |
kubectl describe hpa NAME | HPA details and scaling events |
kubectl get vpa | List VPA resources |
kubectl describe vpa NAME | VPA recommendations |
kubectl get events --field-selector reason=FailedScheduling | Check pending pods |
gcloud container clusters describe NAME --zone ZONE | Check autoscaler config |
Common Pitfalls
| Pitfall | Consequence | Fix |
|---|---|---|
| HPA without resource requests | HPA cannot calculate utilization | Always set resources.requests in pod specs |
| Missing Metrics Server | HPA shows <unknown> metrics | GKE includes it by default; verify with kubectl top pods |
| HPA + VPA on same metric | Conflicting scaling decisions | Use HPA for CPU, VPA for memory (or use VPA in Off mode) |
| No PodDisruptionBudgets | Cluster Autoscaler evicts too many pods | Define PDBs for critical workloads |
| Tight max-nodes limit | Pods stay Pending when limit is hit | Set max-nodes based on budget and peak demand |
| VPA auto mode on stateful apps | Database pods evicted mid-operation | Use updateMode: "Off" for stateful workloads |
| Scale-down too aggressive | Nodes removed during temporary dips | Increase --scale-down-unneeded-time (default 10 min) |
TL;DR
- Manual scaling —
kubectl scalefor pods,gcloud resizefor nodes (good for predictable workloads) - HPA — Automatic pod count based on CPU/memory/custom metrics (most common autoscaler)
- VPA — Adjusts pod resource requests based on usage (use Off mode for recommendations first)
- Cluster Autoscaler — Adds/removes nodes based on pod demand (built into Autopilot)
- Always set resource requests — HPA and VPA depend on them
- Don’t run HPA and VPA on the same metric
- Use PodDisruptionBudgets to protect workloads during scale-down