App Engine Scaling Options

How App Engine scales your application — automatic, basic, and manual scaling modes with configuration examples and when to use each.

Three Scaling Types

App Engine offers three scaling modes, each designed for different workload patterns. You configure the scaling type in your app.yaml.

Scaling Type	Trigger	Best For
Automatic	Metrics (CPU, requests, latency)	Web apps, APIs, variable traffic
Basic	Incoming requests	Intermittent or low-traffic workloads
Manual	Fixed instance count	Stateful apps, long-running tasks

Automatic Scaling

Automatic scaling is the default for new deployments. App Engine continuously monitors metrics and creates or destroys instances to match demand.

How it works

Creates instances when CPU utilization, request count, or pending latency exceed thresholds
Destroys idle instances after ~15 minutes of inactivity
Supports warmup requests to pre-load instances before traffic arrives
Can scale to zero in the Standard environment (when min_instances: 0)

Configuration

runtime: python312
 
automatic_scaling:
  min_instances: 1        # Minimum instances always running
  max_instances: 10       # Cap to control costs
  target_cpu_utilization: 0.6    # Scale up at 60% CPU
  target_throughput_utilization: 0.6  # Scale up at 60% concurrent requests
  max_concurrent_requests: 100   # Requests per instance before scaling
  min_pending_latency: 30ms      # Wait before creating new instances
  max_pending_latency: automatic # Max wait before forced scaling
  min_idle_instances: 1          # Keep at least 1 idle instance
  max_idle_instances: 5          # Don't keep more than 5 idle

Key settings explained

Setting	What it does	Default
`min_instances`	Minimum instances always running (0 = scale to zero)	0 (Standard)
`max_instances`	Maximum instances allowed	20 for new projects created after March 2025; unset for older apps unless configured
`target_cpu_utilization`	CPU % that triggers scaling up (0.5 = 50%)	0.6
`target_throughput_utilization`	Concurrent request utilization threshold	0.6
`max_concurrent_requests`	Max requests per instance before new instances are created	100
`min_pending_latency`	Minimum time to wait before creating new instances	30ms
`max_pending_latency`	Maximum pending latency before forcing new instances	Automatic
`min_idle_instances`	Minimum idle instances to keep running	0
`max_idle_instances`	Maximum idle instances (excess are shut down)	Automatic

Tip: Set min_instances: 1 for latency-sensitive services to avoid cold starts. Set min_instances: 0 for cost savings when traffic is intermittent.

Note: Set max_instances explicitly for production services so scaling and cost behavior are predictable.

Basic Scaling

Basic scaling creates instances when requests arrive and shuts them down when idle. It doesn’t continuously monitor metrics — instances spin up on demand.

How it works

Instance created for each incoming request (up to max_instances)
Instance shut down after idle_timeout expires
No continuous load monitoring
Request timeout: 24 hours (vs. 10 minutes for automatic)

Configuration

runtime: python312
instance_class: B2
 
basic_scaling:
  max_instances: 5
  idle_timeout: 10m

When to use basic scaling

Intermittent or user-activity-driven workloads
Batch processing jobs that run occasionally
Admin tools or internal dashboards with low traffic
Workloads that need longer request timeouts (up to 24 hours) but don’t need continuous instances

Note: Basic scaling uses B-class instance types (B1, B2, B4, B8), not F-class.

Manual Scaling

Manual scaling keeps a fixed number of instances running regardless of load. No automatic scaling happens — you control the count.

How it works

Fixed instance count, always running
State is preserved across requests (in-memory data persists)
Instances are individually addressable
Request timeout: 24 hours
Each instance receives a /_ah/start request on startup

Configuration

runtime: python312
instance_class: B4
 
manual_scaling:
  instances: 3

When to use manual scaling

Applications needing persistent in-memory state (caches, session data)
Long-running tasks that exceed automatic scaling’s 10-minute timeout
WebSocket servers that need stable connections (Flexible only)
Debugging — fixed instances make it easier to reproduce issues
Workloads where you want predictable, fixed costs

Warning: Manual scaling means you pay for all instances continuously. Three B4 instances at $0.20/ h o u r =$ 436/month. Only use manual scaling when you need the specific capabilities it provides.

Comparison

Feature	Automatic	Basic	Manual
Scaling trigger	Metrics-based (CPU, latency)	On-request	Fixed count
Instance classes	F1, F2, F4, F4_1G	B1, B2, B4, B8	B1, B2, B4, B8
Scale to zero	Yes (Standard, min_instances: 0)	Yes (idle timeout)	No
Request timeout	10 minutes	24 hours	24 hours
Instance addressability	No	Yes	Yes
Warmup requests	Yes	No	No
Background threads	No (Java)	Yes	Yes
State persistence	No	Limited	Yes
Cost predictability	Variable	Low (only when active)	High (fixed)

Decision Guide

flowchart TD
    Start["Choosing a Scaling Type"] --> Q1{"Do you need persistent<br/>in-memory state?"}
    Q1 -->|Yes| Manual["Manual Scaling"]
    Q1 -->|No| Q2{"Do requests take<br/>longer than 10 minutes?"}
    Q2 -->|Yes| Q3{"Is the workload<br/>continuous or intermittent?"}
    Q3 -->|Continuous| Manual
    Q3 -->|Intermittent| Basic["Basic Scaling"]
    Q2 -->|No| Q4{"Standard web traffic<br/>or API workload?"}
    Q4 -->|Yes| Auto["Automatic Scaling"]
    Q4 -->|No| Q5{"Low, intermittent traffic<br/>with long requests?"}
    Q5 -->|Yes| Basic
    Q5 -->|No| Auto

Quick recommendations

Workload	Scaling Type	Why
Web frontend	Automatic	Handles traffic spikes, scales to zero when quiet
REST API	Automatic	Variable traffic, needs low latency
Mobile backend	Automatic	Unpredictable traffic patterns
Batch processing	Basic	Runs occasionally, needs long timeouts
Admin panel	Basic	Low, intermittent traffic
WebSocket server	Manual (Flex)	Needs persistent connections
In-memory cache	Manual	Needs persistent state across requests
Long-running task worker	Manual	Requests exceed 10-minute timeout

Pricing Implications

As of May 2026, these examples use USD App Engine Standard instance-hour pricing from Google Cloud’s pricing page.

Instance-hour billing

Instance hours are billed in 15-minute increments
An instance accrues billable time until 15 minutes after it processes its last request
Excess idle instances beyond configured limits do not accumulate billable hours

Cost examples

Automatic scaling (F1, $0.05/hr):

2 instances running 8 hours/day = $0.80/ d a y =$ 24/month
Scales to zero at night = $0 during off-hours
Free tier covers 28 instance-hours/day

Basic scaling (B2, $0.10/hr):

5 instances, 2 hours/day average = $1.00/ d a y =$ 30/month
Free tier covers 9 instance-hours/day

Manual scaling (B4, $0.20/hr):

3 instances, 24/7 = $14.40/ d a y =$ 432/month
No free tier coverage for sustained manual instances

Warning: Manual scaling with large instances can get expensive fast. A B8 instance ( $0.40/ h r) r u nnin g 3 in s t an ces 24/7 cos t s$ 864/month. Monitor your usage.

TL;DR

Automatic scaling: default, metrics-based, best for web apps and APIs. Uses F-class instances.
Basic scaling: on-demand, spins up for requests, shuts down when idle. Best for intermittent workloads with long requests. Uses B-class instances.
Manual scaling: fixed count, always running, stateful. Best for persistent state and long-running tasks. Uses B-class instances.
Set min_instances: 0 to save costs with automatic scaling in the Standard environment.
Instance-hour billing has a 15-minute grace period after the last request.

Resources

How Instances are Managed Instance lifecycle, scaling behavior, and instance classes.

app.yaml Reference: Automatic Scaling All automatic scaling configuration options.

App Engine Pricing Instance pricing and free tier details.

Core Components Instances, services, versions, and the App Engine architecture.

Standard vs Flexible Environment comparison including scaling differences.

Lalit's Cloud & DevOps notes

App Engine Scaling Options

Three Scaling Types

Automatic Scaling

How it works

Configuration

Key settings explained

Basic Scaling

How it works

Configuration

When to use basic scaling

Manual Scaling

How it works

Configuration

When to use manual scaling

Comparison

Decision Guide

Quick recommendations

Pricing Implications

Instance-hour billing

Cost examples

TL;DR

Resources

Graph View

Table of Contents

Backlinks