How Spot VMs work on Google Compute Engine, when to use them, and how to design your workloads to handle preemption.
What Are Spot VMs?
Spot VMs let you use spare Compute Engine capacity at a steep discount (60-91% off on-demand pricing). The trade-off is that Google can reclaim the capacity at any time with a short termination notice. Spot VMs are the current recommended approach, replacing the older preemptible VMs.
Key facts:
- Discount: 60% to 91% off on-demand, depending on machine type and region
- No minimum or maximum runtime (old preemptible VMs had a 24-hour limit)
- No live migration, no SLA
- Free tier credits do not apply
- CUDs and SUDs do not apply to Spot VMs
Spot VMs vs Preemptible VMs
| Feature | Spot VMs | Preemptible VMs |
|---|---|---|
| Max runtime | No limit | 24 hours |
| Discount | 60-91% | Up to ~80-91% |
| Preemption notice | 0s or 120s (configurable, preview) | 30s only |
| Termination action | STOP or DELETE | Always DELETE |
| Status | Current, recommended | Legacy |
Note: If you are still using preemptible VMs, migrate to Spot VMs. They offer more flexibility (no 24-hour limit, configurable termination action) and are the actively supported option.
How Preemption Works
When Google needs to reclaim Spot VM capacity:
- Metadata update: The VM’s
preemptedmetadata is set toTRUE - ACPI signal: An ACPI G2 Soft Off signal is sent to the VM
- Shutdown window: Your shutdown script has up to 30 seconds to run (save state, drain traffic, upload checkpoints)
- Forced termination: If the VM has not stopped after 30 seconds, an ACPI G3 signal forces termination
- Final state: The VM enters
TERMINATEDstate (default) or is deleted, depending on your configured termination action
Preemption notice: By default, the notice is 0 seconds (you only get the 30-second shutdown window). You can configure a 120-second notice (in Preview) to get advance warning before the ACPI signal.
Detecting preemption in a script:
# Check if the VM is about to be preempted
if curl -s -H "Metadata-Flavor: Google" \
"http://metadata.google.internal/computeMetadata/v1/instance/preempted" | grep -q "TRUE"; then
echo "VM is being preempted. Saving state..."
# Save checkpoints, drain traffic, etc.
fiGood and Bad Use Cases
Ideal for Spot VMs:
| Workload | Why It Works |
|---|---|
| Batch processing | Jobs can be checkpointed and resumed |
| CI/CD builds | Failed builds can be retried |
| Distributed data processing (Spark, Hadoop) | Frameworks handle worker failure |
| Image/video rendering | Frame-by-frame processing, retry on failure |
| Stateless web serving (with MIG) | MIG auto-replaces preempted instances |
| Development and testing | Non-critical, interruption is acceptable |
Bad fit for Spot VMs:
| Workload | Why It Fails |
|---|---|
| Single-instance databases | Data loss risk, no failover |
| Long-running monolithic apps | Cannot checkpoint or resume |
| Real-time interactive workloads | Latency spikes on preemption |
| Workloads that cannot tolerate any interruption | Obvious reason |
Designing for Failure
To use Spot VMs effectively, your workload must handle interruption gracefully.
Managed Instance Groups (MIGs): Use a MIG with Spot VMs. When instances are preempted, the MIG automatically recreates them when capacity is available. This gives you self-healing without manual intervention.
gcloud compute instance-groups managed create spot-mig \
--template=spot-template \
--size=10 \
--zone=us-central1-aShutdown scripts: Write a shutdown script that saves state before the 30-second window expires. Upload partial results to Cloud Storage, drain from load balancers, or send a notification.
gcloud compute instances create spot-worker \
--machine-type=e2-medium \
--provisioning-model=SPOT \
--metadata=shutdown-script='#!/bin/bash
# Upload checkpoint to Cloud Storage
gsutil cp /tmp/checkpoint.json gs://my-bucket/checkpoints/
# Notify job coordinator
curl -X POST https://coordinator.example.com/worker-down'Checkpoints for batch jobs: Design batch jobs to save progress periodically. If the VM is preempted, the job resumes from the last checkpoint on the next run.
Metadata polling: For the 120-second notice (Preview), poll the metadata server to detect preemption early and begin graceful shutdown before the 30-second window.
Retries and queues: Use Cloud Pub/Sub or Cloud Tasks to queue work items. If a Spot VM is preempted mid-task, the message returns to the queue and another worker picks it up.
Limitations
- No live migration (VMs are terminated during host maintenance)
- Not all machine types are supported (e.g., A4X and bare metal are excluded)
- No automatic restart on host events
- Not covered by any SLA
- Cannot change an existing VM to Spot or vice versa (must recreate)
- Console does not show preemption probability
TL;DR
- Spot VMs offer 60-91% off on-demand pricing for using spare capacity.
- No runtime limit (unlike old preemptible VMs with a 24-hour max). Preferred over preemptible.
- Google can preempt at any time. You get a 30-second shutdown window to save state.
- Good for: batch jobs, CI/CD, distributed processing, stateless web serving with MIGs.
- Bad for: databases, monolithic apps, real-time workloads, anything that cannot tolerate interruption.
- Design for failure: use MIGs for auto-recreation, shutdown scripts for state saving, checkpoints for batch jobs, and queues for retry logic.
- Spot VMs do not receive CUDs or SUDs. They are a separate pricing category.
Resources
Spot VMs Documentation Official documentation for Spot VM pricing, preemption, and best practices.
Preemptible VM Instances Legacy preemptible VMs documentation (use Spot VMs for new workloads).
Committed-Use Discounts For steady workloads where interruption is not acceptable.
Cost Optimization Overview of all cost levers on Google Cloud.