Models & Inference

Model development, JumpStart foundation models, customization, and inference deployment in Amazon SageMaker AI.

Overview

SageMaker provides the complete model lifecycle:

Stage	Tools
Discover	JumpStart (600+ FMs and algorithms)
Customize	Fine-tuning, RLHF, serverless customization
Train	HyperPod, distributed training
Deploy	Real-time, batch, serverless, async inference
Monitor	Model Monitor, drift detection

Amazon SageMaker JumpStart

One-click access to foundation models and solutions:

Foundation Models (600+)

Provider	Models Available
Meta	Llama 4, Llama 3.3 70B, Llama 3.x
Amazon	Amazon Nova
Mistral AI	Mistral Large, Mixtral
AI21 Labs	Jurassic models
Cohere	Command, Embed
Stability AI	Stable Diffusion XL
Hugging Face	Thousands of open models
OpenAI (open weights)	Compatible models

Built-in Algorithms (Hundreds)

Task Type	Examples
Classification	XGBoost, LightGBM, image/text classification
Regression	Linear learner, neural networks
NLP	Text classification, sentiment analysis
Computer Vision	Object detection, segmentation
Time Series	DeepAR, forecasting

Pre-built Solutions

One-click, end-to-end solutions for:

Demand forecasting
Credit rate prediction
Fraud detection
Computer vision applications

JumpStart is available in AWS GovCloud (US-West and US-East regions).

Model Customization

Fine-tune FMs with your data:

Method	Details
Fine-tuning	Adapt model to specific tasks
Reinforcement Learning (RLHF)	Human feedback for alignment
Serverless Customization	No infrastructure management (new)
Distillation	Create smaller, faster models

Supported Models for Customization

Amazon Nova
Llama (4, 3.x)
Qwen
DeepSeek
GPT-OSS compatible models

New: Serverless Model Customization

AI agent-guided workflow for reinforcement learning
Customize popular models in days (not weeks)
No cluster management required

Training at Scale

Amazon SageMaker HyperPod

Distributed training across massive GPU clusters:

Feature	Details
Scale	Hundreds to thousands of GPUs
Resilience	Automatic failure recovery
Optimization	Training efficiency tools
Governance	Centralized access control
Frameworks	PyTorch, TensorFlow, JAX

Training Features

Feature	Benefit
Elastic Training	Auto-scale clusters up/down
Checkpointless Training	Faster failure recovery
Managed Spot Training	Up to 90% cost savings
Distributed Data Parallel	Scale across multiple GPUs
Model Parallelism	Train models too large for single GPU

Inference (Deployment)

70+ instance types with multiple deployment options:

Deployment Options

Option	Use Case	Latency	Cost
Real-time	Low-latency predictions	Milliseconds	Always-on
Serverless	Variable traffic, pay-per-request	Seconds	Auto-scales
Batch	Large-scale offline predictions	Minutes	Cost-effective
Async	Long-running tasks, queue-based	Variable	Queue-managed

Important Point: Choose inference type by payload size and latency needs:

Real-time → Immediate response needed, small payloads, always-on

Serverless → Variable/unpredictable traffic, auto-scales to zero

Batch → Large datasets (GBs+), scheduled/offline processing

Async → Payloads up to 1 GB, delay acceptable, queue-based

Instance Types

Category	Use Case
CPU	Cost-effective for simple models
GPU	Deep learning, LLMs
Inferentia	Cost-optimized inference (AWS chips)
Trainium	Training-optimized (AWS chips)

Inference Features

Multi-model endpoints: Host multiple models on one endpoint
Auto-scaling: Scale based on traffic patterns
A/B testing: Compare model versions
Shadow testing: Test new models against production

Human-in-the-Loop

Amazon SageMaker Ground Truth

Data labeling at scale:

Feature	Details
Labeling workflows	Images, text, video, 3D
Workforce options	Mechanical Turk, private teams, third-party
Active learning	Reduce labeling by 70%+
Quality control	Consensus, audit tools

Ground Truth Plus

Fully managed labeling service:

AWS-managed expert workforce
Custom labeling workflows
Enterprise-grade SLAs

Amazon A2I (Augmented AI)

Human review for ML predictions:

Use Case	Details
Low-confidence	Route uncertain predictions to humans
Sensitive content	Human review for moderation
Compliance	Audit trail for decisions
Custom workflows	Define review criteria

TL;DR

JumpStart = 600+ FMs + hundreds of algorithms, one-click deployment
Customization = Fine-tuning, RLHF, serverless customization for Llama/Nova/DeepSeek
HyperPod = Distributed training across thousands of GPUs with auto-recovery
Inference = 4 deployment options (real-time, serverless, batch, async), 70+ instance types
Human-in-loop = Ground Truth for labeling, A2I for prediction review
Cost savings = Managed Spot Training (up to 90%), Inferentia chips

Resources

SageMaker JumpStart 🔴
Foundation models and ML solutions hub.

SageMaker HyperPod 🔴
Distributed training at scale.

SageMaker Inference 🔴
Model deployment options.

SageMaker Ground Truth 🔴
Data labeling service.

Lalit's Cloud & DevOps notes