How App Engine scales your application — automatic, basic, and manual scaling modes with configuration examples and when to use each.


Three Scaling Types

App Engine offers three scaling modes, each designed for different workload patterns. You configure the scaling type in your app.yaml.

Scaling TypeTriggerBest For
AutomaticMetrics (CPU, requests, latency)Web apps, APIs, variable traffic
BasicIncoming requestsIntermittent or low-traffic workloads
ManualFixed instance countStateful apps, long-running tasks

Automatic Scaling

Automatic scaling is the default for new deployments. App Engine continuously monitors metrics and creates or destroys instances to match demand.

How it works

  • Creates instances when CPU utilization, request count, or pending latency exceed thresholds
  • Destroys idle instances after ~15 minutes of inactivity
  • Supports warmup requests to pre-load instances before traffic arrives
  • Can scale to zero in the Standard environment (when min_instances: 0)

Configuration

runtime: python312
 
automatic_scaling:
  min_instances: 1        # Minimum instances always running
  max_instances: 10       # Cap to control costs
  target_cpu_utilization: 0.6    # Scale up at 60% CPU
  target_throughput_utilization: 0.6  # Scale up at 60% concurrent requests
  max_concurrent_requests: 100   # Requests per instance before scaling
  min_pending_latency: 30ms      # Wait before creating new instances
  max_pending_latency: automatic # Max wait before forced scaling
  min_idle_instances: 1          # Keep at least 1 idle instance
  max_idle_instances: 5          # Don't keep more than 5 idle

Key settings explained

SettingWhat it doesDefault
min_instancesMinimum instances always running (0 = scale to zero)0 (Standard)
max_instancesMaximum instances allowed20 for new projects created after March 2025; unset for older apps unless configured
target_cpu_utilizationCPU % that triggers scaling up (0.5 = 50%)0.6
target_throughput_utilizationConcurrent request utilization threshold0.6
max_concurrent_requestsMax requests per instance before new instances are created100
min_pending_latencyMinimum time to wait before creating new instances30ms
max_pending_latencyMaximum pending latency before forcing new instancesAutomatic
min_idle_instancesMinimum idle instances to keep running0
max_idle_instancesMaximum idle instances (excess are shut down)Automatic

Tip: Set min_instances: 1 for latency-sensitive services to avoid cold starts. Set min_instances: 0 for cost savings when traffic is intermittent.

Note: Set max_instances explicitly for production services so scaling and cost behavior are predictable.


Basic Scaling

Basic scaling creates instances when requests arrive and shuts them down when idle. It doesn’t continuously monitor metrics — instances spin up on demand.

How it works

  • Instance created for each incoming request (up to max_instances)
  • Instance shut down after idle_timeout expires
  • No continuous load monitoring
  • Request timeout: 24 hours (vs. 10 minutes for automatic)

Configuration

runtime: python312
instance_class: B2
 
basic_scaling:
  max_instances: 5
  idle_timeout: 10m

When to use basic scaling

  • Intermittent or user-activity-driven workloads
  • Batch processing jobs that run occasionally
  • Admin tools or internal dashboards with low traffic
  • Workloads that need longer request timeouts (up to 24 hours) but don’t need continuous instances

Note: Basic scaling uses B-class instance types (B1, B2, B4, B8), not F-class.


Manual Scaling

Manual scaling keeps a fixed number of instances running regardless of load. No automatic scaling happens — you control the count.

How it works

  • Fixed instance count, always running
  • State is preserved across requests (in-memory data persists)
  • Instances are individually addressable
  • Request timeout: 24 hours
  • Each instance receives a /_ah/start request on startup

Configuration

runtime: python312
instance_class: B4
 
manual_scaling:
  instances: 3

When to use manual scaling

  • Applications needing persistent in-memory state (caches, session data)
  • Long-running tasks that exceed automatic scaling’s 10-minute timeout
  • WebSocket servers that need stable connections (Flexible only)
  • Debugging — fixed instances make it easier to reproduce issues
  • Workloads where you want predictable, fixed costs

Warning: Manual scaling means you pay for all instances continuously. Three B4 instances at 436/month. Only use manual scaling when you need the specific capabilities it provides.


Comparison

FeatureAutomaticBasicManual
Scaling triggerMetrics-based (CPU, latency)On-requestFixed count
Instance classesF1, F2, F4, F4_1GB1, B2, B4, B8B1, B2, B4, B8
Scale to zeroYes (Standard, min_instances: 0)Yes (idle timeout)No
Request timeout10 minutes24 hours24 hours
Instance addressabilityNoYesYes
Warmup requestsYesNoNo
Background threadsNo (Java)YesYes
State persistenceNoLimitedYes
Cost predictabilityVariableLow (only when active)High (fixed)

Decision Guide

flowchart TD
    Start["Choosing a Scaling Type"] --> Q1{"Do you need persistent<br/>in-memory state?"}
    Q1 -->|Yes| Manual["Manual Scaling"]
    Q1 -->|No| Q2{"Do requests take<br/>longer than 10 minutes?"}
    Q2 -->|Yes| Q3{"Is the workload<br/>continuous or intermittent?"}
    Q3 -->|Continuous| Manual
    Q3 -->|Intermittent| Basic["Basic Scaling"]
    Q2 -->|No| Q4{"Standard web traffic<br/>or API workload?"}
    Q4 -->|Yes| Auto["Automatic Scaling"]
    Q4 -->|No| Q5{"Low, intermittent traffic<br/>with long requests?"}
    Q5 -->|Yes| Basic
    Q5 -->|No| Auto

Quick recommendations

WorkloadScaling TypeWhy
Web frontendAutomaticHandles traffic spikes, scales to zero when quiet
REST APIAutomaticVariable traffic, needs low latency
Mobile backendAutomaticUnpredictable traffic patterns
Batch processingBasicRuns occasionally, needs long timeouts
Admin panelBasicLow, intermittent traffic
WebSocket serverManual (Flex)Needs persistent connections
In-memory cacheManualNeeds persistent state across requests
Long-running task workerManualRequests exceed 10-minute timeout

Pricing Implications

As of May 2026, these examples use USD App Engine Standard instance-hour pricing from Google Cloud’s pricing page.

Instance-hour billing

  • Instance hours are billed in 15-minute increments
  • An instance accrues billable time until 15 minutes after it processes its last request
  • Excess idle instances beyond configured limits do not accumulate billable hours

Cost examples

Automatic scaling (F1, $0.05/hr):

  • 2 instances running 8 hours/day = 24/month
  • Scales to zero at night = $0 during off-hours
  • Free tier covers 28 instance-hours/day

Basic scaling (B2, $0.10/hr):

  • 5 instances, 2 hours/day average = 30/month
  • Free tier covers 9 instance-hours/day

Manual scaling (B4, $0.20/hr):

  • 3 instances, 24/7 = 432/month
  • No free tier coverage for sustained manual instances

Warning: Manual scaling with large instances can get expensive fast. A B8 instance (864/month. Monitor your usage.


TL;DR

  • Automatic scaling: default, metrics-based, best for web apps and APIs. Uses F-class instances.
  • Basic scaling: on-demand, spins up for requests, shuts down when idle. Best for intermittent workloads with long requests. Uses B-class instances.
  • Manual scaling: fixed count, always running, stateful. Best for persistent state and long-running tasks. Uses B-class instances.
  • Set min_instances: 0 to save costs with automatic scaling in the Standard environment.
  • Instance-hour billing has a 15-minute grace period after the last request.

Resources

How Instances are Managed Instance lifecycle, scaling behavior, and instance classes.

app.yaml Reference: Automatic Scaling All automatic scaling configuration options.

App Engine Pricing Instance pricing and free tier details.

Core Components Instances, services, versions, and the App Engine architecture.

Standard vs Flexible Environment comparison including scaling differences.