How App Engine scales your application — automatic, basic, and manual scaling modes with configuration examples and when to use each.
Three Scaling Types
App Engine offers three scaling modes, each designed for different workload patterns. You configure the scaling type in your app.yaml.
| Scaling Type | Trigger | Best For |
|---|---|---|
| Automatic | Metrics (CPU, requests, latency) | Web apps, APIs, variable traffic |
| Basic | Incoming requests | Intermittent or low-traffic workloads |
| Manual | Fixed instance count | Stateful apps, long-running tasks |
Automatic Scaling
Automatic scaling is the default for new deployments. App Engine continuously monitors metrics and creates or destroys instances to match demand.
How it works
- Creates instances when CPU utilization, request count, or pending latency exceed thresholds
- Destroys idle instances after ~15 minutes of inactivity
- Supports warmup requests to pre-load instances before traffic arrives
- Can scale to zero in the Standard environment (when
min_instances: 0)
Configuration
runtime: python312
automatic_scaling:
min_instances: 1 # Minimum instances always running
max_instances: 10 # Cap to control costs
target_cpu_utilization: 0.6 # Scale up at 60% CPU
target_throughput_utilization: 0.6 # Scale up at 60% concurrent requests
max_concurrent_requests: 100 # Requests per instance before scaling
min_pending_latency: 30ms # Wait before creating new instances
max_pending_latency: automatic # Max wait before forced scaling
min_idle_instances: 1 # Keep at least 1 idle instance
max_idle_instances: 5 # Don't keep more than 5 idleKey settings explained
| Setting | What it does | Default |
|---|---|---|
min_instances | Minimum instances always running (0 = scale to zero) | 0 (Standard) |
max_instances | Maximum instances allowed | 20 for new projects created after March 2025; unset for older apps unless configured |
target_cpu_utilization | CPU % that triggers scaling up (0.5 = 50%) | 0.6 |
target_throughput_utilization | Concurrent request utilization threshold | 0.6 |
max_concurrent_requests | Max requests per instance before new instances are created | 100 |
min_pending_latency | Minimum time to wait before creating new instances | 30ms |
max_pending_latency | Maximum pending latency before forcing new instances | Automatic |
min_idle_instances | Minimum idle instances to keep running | 0 |
max_idle_instances | Maximum idle instances (excess are shut down) | Automatic |
Tip: Set
min_instances: 1for latency-sensitive services to avoid cold starts. Setmin_instances: 0for cost savings when traffic is intermittent.
Note: Set
max_instancesexplicitly for production services so scaling and cost behavior are predictable.
Basic Scaling
Basic scaling creates instances when requests arrive and shuts them down when idle. It doesn’t continuously monitor metrics — instances spin up on demand.
How it works
- Instance created for each incoming request (up to max_instances)
- Instance shut down after idle_timeout expires
- No continuous load monitoring
- Request timeout: 24 hours (vs. 10 minutes for automatic)
Configuration
runtime: python312
instance_class: B2
basic_scaling:
max_instances: 5
idle_timeout: 10mWhen to use basic scaling
- Intermittent or user-activity-driven workloads
- Batch processing jobs that run occasionally
- Admin tools or internal dashboards with low traffic
- Workloads that need longer request timeouts (up to 24 hours) but don’t need continuous instances
Note: Basic scaling uses B-class instance types (B1, B2, B4, B8), not F-class.
Manual Scaling
Manual scaling keeps a fixed number of instances running regardless of load. No automatic scaling happens — you control the count.
How it works
- Fixed instance count, always running
- State is preserved across requests (in-memory data persists)
- Instances are individually addressable
- Request timeout: 24 hours
- Each instance receives a
/_ah/startrequest on startup
Configuration
runtime: python312
instance_class: B4
manual_scaling:
instances: 3When to use manual scaling
- Applications needing persistent in-memory state (caches, session data)
- Long-running tasks that exceed automatic scaling’s 10-minute timeout
- WebSocket servers that need stable connections (Flexible only)
- Debugging — fixed instances make it easier to reproduce issues
- Workloads where you want predictable, fixed costs
Warning: Manual scaling means you pay for all instances continuously. Three B4 instances at 436/month. Only use manual scaling when you need the specific capabilities it provides.
Comparison
| Feature | Automatic | Basic | Manual |
|---|---|---|---|
| Scaling trigger | Metrics-based (CPU, latency) | On-request | Fixed count |
| Instance classes | F1, F2, F4, F4_1G | B1, B2, B4, B8 | B1, B2, B4, B8 |
| Scale to zero | Yes (Standard, min_instances: 0) | Yes (idle timeout) | No |
| Request timeout | 10 minutes | 24 hours | 24 hours |
| Instance addressability | No | Yes | Yes |
| Warmup requests | Yes | No | No |
| Background threads | No (Java) | Yes | Yes |
| State persistence | No | Limited | Yes |
| Cost predictability | Variable | Low (only when active) | High (fixed) |
Decision Guide
flowchart TD Start["Choosing a Scaling Type"] --> Q1{"Do you need persistent<br/>in-memory state?"} Q1 -->|Yes| Manual["Manual Scaling"] Q1 -->|No| Q2{"Do requests take<br/>longer than 10 minutes?"} Q2 -->|Yes| Q3{"Is the workload<br/>continuous or intermittent?"} Q3 -->|Continuous| Manual Q3 -->|Intermittent| Basic["Basic Scaling"] Q2 -->|No| Q4{"Standard web traffic<br/>or API workload?"} Q4 -->|Yes| Auto["Automatic Scaling"] Q4 -->|No| Q5{"Low, intermittent traffic<br/>with long requests?"} Q5 -->|Yes| Basic Q5 -->|No| Auto
Quick recommendations
| Workload | Scaling Type | Why |
|---|---|---|
| Web frontend | Automatic | Handles traffic spikes, scales to zero when quiet |
| REST API | Automatic | Variable traffic, needs low latency |
| Mobile backend | Automatic | Unpredictable traffic patterns |
| Batch processing | Basic | Runs occasionally, needs long timeouts |
| Admin panel | Basic | Low, intermittent traffic |
| WebSocket server | Manual (Flex) | Needs persistent connections |
| In-memory cache | Manual | Needs persistent state across requests |
| Long-running task worker | Manual | Requests exceed 10-minute timeout |
Pricing Implications
As of May 2026, these examples use USD App Engine Standard instance-hour pricing from Google Cloud’s pricing page.
Instance-hour billing
- Instance hours are billed in 15-minute increments
- An instance accrues billable time until 15 minutes after it processes its last request
- Excess idle instances beyond configured limits do not accumulate billable hours
Cost examples
Automatic scaling (F1, $0.05/hr):
- 2 instances running 8 hours/day = 24/month
- Scales to zero at night = $0 during off-hours
- Free tier covers 28 instance-hours/day
Basic scaling (B2, $0.10/hr):
- 5 instances, 2 hours/day average = 30/month
- Free tier covers 9 instance-hours/day
Manual scaling (B4, $0.20/hr):
- 3 instances, 24/7 = 432/month
- No free tier coverage for sustained manual instances
Warning: Manual scaling with large instances can get expensive fast. A B8 instance (864/month. Monitor your usage.
TL;DR
- Automatic scaling: default, metrics-based, best for web apps and APIs. Uses F-class instances.
- Basic scaling: on-demand, spins up for requests, shuts down when idle. Best for intermittent workloads with long requests. Uses B-class instances.
- Manual scaling: fixed count, always running, stateful. Best for persistent state and long-running tasks. Uses B-class instances.
- Set
min_instances: 0to save costs with automatic scaling in the Standard environment. - Instance-hour billing has a 15-minute grace period after the last request.
Resources
How Instances are Managed Instance lifecycle, scaling behavior, and instance classes.
app.yaml Reference: Automatic Scaling All automatic scaling configuration options.
App Engine Pricing Instance pricing and free tier details.
Core Components Instances, services, versions, and the App Engine architecture.
Standard vs Flexible Environment comparison including scaling differences.