Model development, JumpStart foundation models, customization, and inference deployment in Amazon SageMaker AI.
Overview
SageMaker provides the complete model lifecycle:
| Stage | Tools |
|---|---|
| Discover | JumpStart (600+ FMs and algorithms) |
| Customize | Fine-tuning, RLHF, serverless customization |
| Train | HyperPod, distributed training |
| Deploy | Real-time, batch, serverless, async inference |
| Monitor | Model Monitor, drift detection |
Amazon SageMaker JumpStart
One-click access to foundation models and solutions:
Foundation Models (600+)
| Provider | Models Available |
|---|---|
| Meta | Llama 4, Llama 3.3 70B, Llama 3.x |
| Amazon | Amazon Nova |
| Mistral AI | Mistral Large, Mixtral |
| AI21 Labs | Jurassic models |
| Cohere | Command, Embed |
| Stability AI | Stable Diffusion XL |
| Hugging Face | Thousands of open models |
| OpenAI (open weights) | Compatible models |
Built-in Algorithms (Hundreds)
| Task Type | Examples |
|---|---|
| Classification | XGBoost, LightGBM, image/text classification |
| Regression | Linear learner, neural networks |
| NLP | Text classification, sentiment analysis |
| Computer Vision | Object detection, segmentation |
| Time Series | DeepAR, forecasting |
Pre-built Solutions
One-click, end-to-end solutions for:
- Demand forecasting
- Credit rate prediction
- Fraud detection
- Computer vision applications
JumpStart is available in AWS GovCloud (US-West and US-East regions).
Model Customization
Fine-tune FMs with your data:
| Method | Details |
|---|---|
| Fine-tuning | Adapt model to specific tasks |
| Reinforcement Learning (RLHF) | Human feedback for alignment |
| Serverless Customization | No infrastructure management (new) |
| Distillation | Create smaller, faster models |
Supported Models for Customization
- Amazon Nova
- Llama (4, 3.x)
- Qwen
- DeepSeek
- GPT-OSS compatible models
New: Serverless Model Customization
- AI agent-guided workflow for reinforcement learning
- Customize popular models in days (not weeks)
- No cluster management required
Training at Scale
Amazon SageMaker HyperPod
Distributed training across massive GPU clusters:
| Feature | Details |
|---|---|
| Scale | Hundreds to thousands of GPUs |
| Resilience | Automatic failure recovery |
| Optimization | Training efficiency tools |
| Governance | Centralized access control |
| Frameworks | PyTorch, TensorFlow, JAX |
Training Features
| Feature | Benefit |
|---|---|
| Elastic Training | Auto-scale clusters up/down |
| Checkpointless Training | Faster failure recovery |
| Managed Spot Training | Up to 90% cost savings |
| Distributed Data Parallel | Scale across multiple GPUs |
| Model Parallelism | Train models too large for single GPU |
Inference (Deployment)
70+ instance types with multiple deployment options:
Deployment Options
| Option | Use Case | Latency | Cost |
|---|---|---|---|
| Real-time | Low-latency predictions | Milliseconds | Always-on |
| Serverless | Variable traffic, pay-per-request | Seconds | Auto-scales |
| Batch | Large-scale offline predictions | Minutes | Cost-effective |
| Async | Long-running tasks, queue-based | Variable | Queue-managed |
Important Point: Choose inference type by payload size and latency needs:
- Real-time → Immediate response needed, small payloads, always-on
- Serverless → Variable/unpredictable traffic, auto-scales to zero
- Batch → Large datasets (GBs+), scheduled/offline processing
- Async → Payloads up to 1 GB, delay acceptable, queue-based
Instance Types
| Category | Use Case |
|---|---|
| CPU | Cost-effective for simple models |
| GPU | Deep learning, LLMs |
| Inferentia | Cost-optimized inference (AWS chips) |
| Trainium | Training-optimized (AWS chips) |
Inference Features
- Multi-model endpoints: Host multiple models on one endpoint
- Auto-scaling: Scale based on traffic patterns
- A/B testing: Compare model versions
- Shadow testing: Test new models against production
Human-in-the-Loop
Amazon SageMaker Ground Truth
Data labeling at scale:
| Feature | Details |
|---|---|
| Labeling workflows | Images, text, video, 3D |
| Workforce options | Mechanical Turk, private teams, third-party |
| Active learning | Reduce labeling by 70%+ |
| Quality control | Consensus, audit tools |
Ground Truth Plus
Fully managed labeling service:
- AWS-managed expert workforce
- Custom labeling workflows
- Enterprise-grade SLAs
Amazon A2I (Augmented AI)
Human review for ML predictions:
| Use Case | Details |
|---|---|
| Low-confidence | Route uncertain predictions to humans |
| Sensitive content | Human review for moderation |
| Compliance | Audit trail for decisions |
| Custom workflows | Define review criteria |
TL;DR
- JumpStart = 600+ FMs + hundreds of algorithms, one-click deployment
- Customization = Fine-tuning, RLHF, serverless customization for Llama/Nova/DeepSeek
- HyperPod = Distributed training across thousands of GPUs with auto-recovery
- Inference = 4 deployment options (real-time, serverless, batch, async), 70+ instance types
- Human-in-loop = Ground Truth for labeling, A2I for prediction review
- Cost savings = Managed Spot Training (up to 90%), Inferentia chips
Resources
SageMaker JumpStart 🔴
Foundation models and ML solutions hub.
SageMaker HyperPod 🔴
Distributed training at scale.
SageMaker Inference 🔴
Model deployment options.
SageMaker Ground Truth 🔴
Data labeling service.