How to use managed and unmanaged instance groups on Google Compute Engine for scalable, self-healing VM deployments.
What Are Instance Groups?
An instance group is a collection of VMs that you manage as a single entity. Google Cloud offers two types:
- Managed Instance Groups (MIGs) — Identical VMs created from an Instance Template, with built-in autoscaling, autohealing, and automated updates.
- Unmanaged Instance Groups — A loose collection of heterogeneous VMs, used primarily as a load balancer backend when VMs have different configurations.
| Aspect | Managed Instance Group | Unmanaged Instance Group |
|---|---|---|
| VMs | Identical (from template) | Heterogeneous |
| Autoscaling | Yes | No |
| Autohealing | Yes | No |
| Rolling updates | Yes | No |
| Multi-zone | Yes (regional MIGs) | No |
| Use case | Scalable production workloads | Load balancing existing VMs |
In practice: MIGs are the standard way to run production VM fleets on GCE. You declare the desired state (template, target size), and the MIG keeps the actual state converged automatically.
Managed Instance Groups (MIGs)
A MIG creates and maintains a group of identical VMs from a single instance template. You set the target size, and the MIG handles creation, monitoring, healing, and scaling.
Core capabilities:
- Autoscaling — Dynamically adds or removes VMs based on load (CPU, LB capacity, custom metrics, schedules, Pub/Sub queue depth)
- Autohealing — Recreates VMs that fail health checks, including application-level checks (crashes, freezes, OOM)
- Regional deployment — Distributes VMs across multiple zones to survive zonal failures
- Automated updates — Rolling updates and canary deployments with controlled disruption
- Stateful support — Optional per-instance state preservation (disks, IPs, metadata)
flowchart TD T["Instance Template"] --> MIG["Managed Instance Group<br/>(target size: N)"] MIG --> VM1["VM 1"] MIG --> VM2["VM 2"] MIG --> VMN["VM N"] LB["Load Balancer"] --> MIG HC["Health Check"] -->|signals unhealthy| MIG MIG -->|recreates| VMN AS["Autoscaler"] -->|scale out/in| MIG
Key Insight: MIGs are intent-based. You declare the desired state (which template, how many VMs), and the MIG continuously converges to that state. If a VM crashes, is preempted, or fails a health check, the MIG replaces it automatically.
Unmanaged Instance Groups
An unmanaged instance group is a collection of VMs with different configurations that you can use as a load balancer backend. You add and remove VMs manually.
What they do: Let you load balance across a fleet of individually managed, non-identical VMs.
What they do not provide: Autoscaling, autohealing, rolling updates, multi-zone deployment, instance templates, or any automated management. Maximum 2,000 VMs per group.
| Scenario | Why Unmanaged |
|---|---|
| Existing heterogeneous VMs | VMs with different configs that need a single LB backend |
| Migration phase | Temporarily grouping VMs while migrating to MIGs |
| One-off load balancing | Simple case where MIG overhead is unnecessary |
Warning: Do not use unmanaged instance groups for new production workloads. They lack autoscaling, autohealing, and automated updates. Use MIGs instead. Unmanaged groups exist primarily for load balancing legacy or heterogeneous VM fleets.
MIG vs Unmanaged Comparison
| Feature | Managed (MIG) | Unmanaged |
|---|---|---|
| VM homogeneity | Identical (template-based) | Heterogeneous |
| Autoscaling | Yes (CPU, LB, metrics, schedule, Pub/Sub) | No |
| Autohealing | Yes (health-check driven recreation) | No |
| Rolling updates | Yes (with canary support) | No |
| Regional (multi-zone) | Yes | No |
| Instance templates | Required | Not used |
| Stateful support | Yes | No |
| Max VMs | 1,000 zonal / 2,000 regional (expandable to 4,000) | 2,000 |
| Load balancing | Backend service or target pool | Backend service or target pool |
| Pricing | No separate instance group charge | No separate instance group charge |
Zonal vs Regional MIGs
| Property | Zonal MIG | Regional MIG |
|---|---|---|
| Zones | Single zone | Multiple zones (default 3) |
| Max VMs | 1,000 (expandable) | 2,000 (expandable) |
| Zonal failure tolerance | None | Yes (traffic shifts to remaining zones) |
| Creation | --zone=ZONE | --region=REGION |
| Default maxSurge | 1 | Number of zones (default 3) |
| Default maxUnavailable | 1 | Number of zones (default 3) |
| Pub/Sub autoscaling | Yes | Yes |
A zonal MIG is simpler but vulnerable to a single-zone outage. A regional MIG spreads instances across multiple zones within a region and can redistribute them after a zone recovers.
Tip: Use regional MIGs for production workloads. There is no separate charge for choosing a regional MIG, but you still pay for the VMs, disks, load balancers, and other resources the group uses.
Autoscaling
MIG autoscaling adds or removes VMs based on load signals. You can combine multiple signals in a single policy; the autoscaler uses the largest recommended size across all of them.
| Policy | Signal | Best For |
|---|---|---|
| CPU utilization | Average CPU across group | General web serving, API backends |
| Load balancing capacity | HTTP load per instance | HTTP/S traffic behind a load balancer |
| Cloud Monitoring metric | Any custom or built-in metric | Application-specific signals (queue depth, latency) |
| Schedule-based | Time of day / day of week | Predictable traffic patterns |
| Predictive | ML-based forecast | Workloads with historical patterns and slow initialization |
| Pub/Sub queue | Unacknowledged messages in a subscription | Async processing, event-driven workloads |
CPU-Based Autoscaling
gcloud compute instance-groups managed set-autoscaling my-mig \
--max-num-replicas=10 \
--min-num-replicas=2 \
--target-cpu-utilization=0.7 \
--zone=us-central1-aPub/Sub-Based Autoscaling
gcloud compute instance-groups managed set-autoscaling pubsub-mig \
--max-num-replicas=20 \
--min-num-replicas=1 \
--update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
--stackdriver-metric-filter='resource.type="pubsub_subscription" AND resource.labels.subscription_id="my-sub"' \
--stackdriver-metric-single-instance-assignment=100 \
--zone=us-central1-aUse --region=REGION instead of --zone=ZONE when configuring autoscaling for a regional MIG.
Note: Autoscaling is configured on the MIG, not the instance template. You set it after creating the MIG.
Scale-in controls let you limit how fast the group can shrink (e.g., “remove at most 3 VMs per 300 seconds”). Use these for workloads with long initialization times to prevent sudden capacity drops.
Initialization period (formerly cool down) tells the autoscaler how long to ignore usage data from a newly created VM while it boots and initializes. Set this to match your application’s startup time.
Tip: Set
--min-num-replicasto at least 2 for production workloads. A single instance is a single point of failure.
Autohealing and Health Checks
Autohealing automatically recreates VMs that fail health checks. This catches application-level failures (crashes, freezes, out-of-memory) that a VM-level restart would miss.
LB Health Checks vs Autohealing Health Checks
| Aspect | LB Health Check | Autohealing Health Check |
|---|---|---|
| Purpose | Stop sending traffic to unhealthy instances | Delete and recreate unhealthy instances |
| Aggressiveness | Should be aggressive (quick detection) | Should be conservative (avoid unnecessary recreation) |
| Impact | Traffic shifts; instance keeps running | Instance is deleted and recreated |
| Recommended check interval | 5–10 seconds | 30–60 seconds |
| Recommended unhealthy threshold | 2–3 consecutive failures | 5–10 consecutive failures |
Key Insight: Use separate health checks for load balancing and autohealing. LB checks should be aggressive — catch a struggling instance quickly and stop sending traffic. Autohealing checks should be conservative — recreating a VM is disruptive, so you want to be sure it’s actually broken, not just temporarily slow.
Configuring Autohealing
# Create a health check for autohealing (conservative settings)
gcloud compute health-checks create http autohealing-check \
--port=80 \
--check-interval=30 \
--timeout=10 \
--unhealthy-threshold=5 \
--healthy-threshold=2
# Attach to the MIG
gcloud compute instance-groups managed update my-mig \
--health-check=autohealing-check \
--initial-delay=120 \
--zone=us-central1-a--initial-delay sets the grace period after a VM starts before health checking begins. Set this long enough for your Startup Scripts to finish and the application to initialize. If the health check fires too early, autohealing will recreate VMs that are still booting.
For Spot VMs in a MIG, autohealing automatically recreates instances that get preempted. See Spot VMs for cost-effective compute with self-healing.
Rolling Updates and Canary Deployments
The MIG Updater lets you deploy new configurations across your instances with controlled disruption.
During a rolling update, the MIG compares the current VM configuration with the target template, creates or recreates VMs in batches, waits for each new VM to become ready, and then continues until the group reaches the target version. The disruption is controlled by two budgets:
maxSurgecontrols how many extra VMs can be created above the target size.maxUnavailablecontrols how many existing VMs can be offline at the same time.
For zero-downtime stateless updates, use maxUnavailable=0 and maxSurge>0 so replacement VMs become ready before old VMs are removed. This requires enough quota for the temporary extra VMs. If you must preserve instance names, use replacementMethod=RECREATE; that mode requires maxSurge=0, so it is slower and more disruptive.
flowchart TD START["Start Rolling Update<br/>Target: replace all VMs"] --> CHECK{Enough quota<br/>for maxSurge?} CHECK -->|Yes| SURGE["Create replacement VMs<br/>(up to maxSurge)"] CHECK -->|No| FAIL["Update waits<br/>for quota"] SURGE --> READY["Wait for healthy<br/>(minReadySec + health check)"] READY --> DELETE["Delete old VMs<br/>(maxUnavailable budget)"] DELETE --> REMAIN{More VMs<br/>to update?} REMAIN -->|Yes| SURGE REMAIN -->|No| DONE["Update complete<br/>version target reached<br/>status.isStable=true"]
Update Parameters
| Parameter | Default (Zonal) | Default (Regional) | Purpose |
|---|---|---|---|
maxSurge | 1 | Number of zones (3) | Extra VMs created during update |
maxUnavailable | 1 | Number of zones (3) | VMs allowed offline at any time |
minReadySec | 0 | 0 | Wait time before considering a VM ready |
replacementMethod | SUBSTITUTE | SUBSTITUTE | SUBSTITUTE creates replacement VMs with new names; RECREATE preserves names but requires maxSurge=0 |
Update types:
- Proactive — The MIG automatically rolls out the update to all instances
- Opportunistic — Updates applied only when instances are recreated for other reasons (resize, repair)
To confirm completion, check both status.versionTarget.isReached and status.isStable. The version target can be reached while the group is still finishing repairs, verifications, or other actions.
Rolling Update
gcloud compute instance-groups managed rolling-action start-update my-mig \
--version=template=web-server-v2 \
--max-surge=3 \
--max-unavailable=1 \
--min-ready=2m \
--zone=us-central1-aCanary Update (10% of VMs)
gcloud compute instance-groups managed rolling-action start-update my-mig \
--version=template=web-server-v1 \
--canary-version=template=web-server-v2,target-size=10% \
--zone=us-central1-aA MIG supports up to two instance template versions simultaneously. After verifying the canary, roll forward:
gcloud compute instance-groups managed rolling-action start-update my-mig \
--version=template=web-server-v2 \
--zone=us-central1-aRollback
gcloud compute instance-groups managed rolling-action start-update my-mig \
--version=template=web-server-v1 \
--max-unavailable=100% \
--zone=us-central1-aSee Instance Templates for the full template creation and update workflow.
Stateful vs Stateless MIGs
| Aspect | Stateless MIG | Stateful MIG |
|---|---|---|
| VM identity | Disposable; names can change | Preserved across recreation |
| Persistent disks | Ephemeral or recreated from template | Attached to specific instance, preserved |
| Metadata | Same across all VMs | Per-instance metadata preserved |
| Autoscaling | Supported | Not supported |
| Autohealing | Supported | Supported |
| Update method | SUBSTITUTE (default) | RECREATE (required) |
| Use case | Web servers, API backends, workers | Databases, legacy apps, stateful processing |
Stateful MIGs preserve instance names, persistent disks, internal IPs, and per-instance metadata across VM recreation. This makes them suitable for workloads like Cassandra, Elasticsearch, Kafka, ZooKeeper, or legacy monoliths that depend on stable instance identity.
Stateless MIGs treat all VMs as interchangeable. When a VM is recreated, it gets a fresh disk and no preserved state. This is the right choice for web frontends, REST APIs, and any horizontally scalable workload.
Note: Stateful MIGs are a specialized feature. Most workloads should use stateless MIGs. Only use stateful MIGs when your application requires stable instance identity or per-instance disk state. Consider managed services (Cloud SQL, Dataproc, Memorystore) before committing to stateful MIGs for databases or data processing.
Creating MIGs
MIGs require an Instance Template. Create one first, then create the MIG.
Zonal MIG
gcloud compute instance-groups managed create my-mig \
--template=web-server-template \
--size=3 \
--zone=us-central1-aRegional MIG
gcloud compute instance-groups managed create my-regional-mig \
--template=web-server-template \
--size=6 \
--region=us-central1MIG with Autoscaling
gcloud compute instance-groups managed create autoscaled-mig \
--template=web-server-template \
--size=3 \
--zone=us-central1-a
gcloud compute instance-groups managed set-autoscaling autoscaled-mig \
--max-num-replicas=10 \
--min-num-replicas=3 \
--target-cpu-utilization=0.7 \
--zone=us-central1-aMIG with Spot VMs
The template specifies the Spot provisioning model, then the MIG auto-recreates preempted instances:
gcloud compute instance-templates create spot-template \
--machine-type=e2-medium \
--provisioning-model=SPOT \
--image-family=debian-12 \
--image-project=debian-cloud
gcloud compute instance-groups managed create spot-mig \
--template=spot-template \
--size=5 \
--zone=us-central1-aSee Spot VMs for details on Spot provisioning behavior and cost savings.
Tip: Use Custom Images in your template instead of long startup scripts. Pre-baked images boot faster, which means faster scale-out and shorter initialization periods for autoscaling.
Useful Commands
| Task | Command |
|---|---|
| List instances in a MIG | gcloud compute instance-groups managed list-instances MIG_NAME --zone=ZONE |
| Describe a MIG | gcloud compute instance-groups managed describe MIG_NAME --zone=ZONE |
| Resize a MIG | gcloud compute instance-groups managed resize MIG_NAME --size=N --zone=ZONE |
| Delete a MIG | gcloud compute instance-groups managed delete MIG_NAME --zone=ZONE |
| Wait for update to finish | gcloud compute instance-groups managed wait-until MIG_NAME --version-target-reached --zone=ZONE |
Best Practices
| Practice | Why |
|---|---|
| Use regional MIGs for production | Multi-zone distribution protects against zonal failure |
| Use separate health checks for LB and autohealing | LB checks should be aggressive; autohealing checks should be conservative |
Set initialDelaySec correctly | Prevent autohealing from recreating VMs that are still booting |
| Use custom images instead of long startup scripts | Faster scale-out, no dependency on package repos at boot time |
Set maxSurge > 0 and maxUnavailable = 0 | Zero-downtime updates: new VMs are ready before old ones are removed |
| Use canary updates for risky deployments | Test on a subset before full rollout |
| Keep templates immutable | Create new templates for updates; never try to edit existing ones |
Monitor status.versionTarget.isReached and status.isStable | Confirm the target template is reached and the MIG has no pending actions |
Use RECREATE replacement for stateful MIGs | Required to preserve instance names and disk state |
Set --min-num-replicas >= 2 | Avoid a single point of failure in production |
TL;DR
- Instance groups come in two types: managed (MIGs) for identical, auto-managed VMs, and unmanaged for load balancing heterogeneous VMs.
- MIGs provide autoscaling, autohealing, rolling updates, and regional deployment. Unmanaged groups provide none of these.
- Regional MIGs spread VMs across multiple zones for zonal failure protection. Default zone count is 3.
- Autoscaling supports CPU, load balancing, Cloud Monitoring metrics, schedules, predictive, and Pub/Sub signals.
- Use separate health checks for load balancing (aggressive) and autohealing (conservative). They serve different purposes.
- Rolling updates use
maxSurgeandmaxUnavailableto control disruption. Canary updates test a new template on a subset of VMs. - MIGs require an instance template. Create a new template for each update — templates are immutable.
- Use custom images in templates for fast scale-out. Avoid long startup scripts that slow boot time.
- There is no separate charge for instance groups. You pay for the VMs, disks, load balancers, health checks, logging, and other resources the group creates or uses.
Resources
Instance Groups Documentation Official overview of managed and unmanaged instance groups.
Create Managed Instance Groups Step-by-step guides for zonal and regional MIG creation.
Autoscaling Groups of Instances Autoscaling policies, configuration, and behavior.
Rolling Updates in MIGs Automated updates, canary deployments, and update policy options.
Set Up Autohealing Health check configuration for autohealing policies.
Stateful MIGs Preserving per-instance state across recreation and updates.
Instance Templates How to create the templates that MIGs require.
Custom Images Build fast-booting images for MIG scale-out.
High Availability, Live Migration, and Automatic Restart How host maintenance events interact with MIG autohealing.
VM Startup Scripts Automate VM configuration — health checks detect failed startup scripts.
Spot VMs Use Spot VMs in MIGs for cost-effective, self-healing compute.
Google Compute Engine Overview of GCE features and architecture.