Model development, JumpStart foundation models, customization, and inference deployment in Amazon SageMaker AI.


Overview

SageMaker provides the complete model lifecycle:

StageTools
DiscoverJumpStart (600+ FMs and algorithms)
CustomizeFine-tuning, RLHF, serverless customization
TrainHyperPod, distributed training
DeployReal-time, batch, serverless, async inference
MonitorModel Monitor, drift detection

Amazon SageMaker JumpStart

One-click access to foundation models and solutions:

Foundation Models (600+)

ProviderModels Available
MetaLlama 4, Llama 3.3 70B, Llama 3.x
AmazonAmazon Nova
Mistral AIMistral Large, Mixtral
AI21 LabsJurassic models
CohereCommand, Embed
Stability AIStable Diffusion XL
Hugging FaceThousands of open models
OpenAI (open weights)Compatible models

Built-in Algorithms (Hundreds)

Task TypeExamples
ClassificationXGBoost, LightGBM, image/text classification
RegressionLinear learner, neural networks
NLPText classification, sentiment analysis
Computer VisionObject detection, segmentation
Time SeriesDeepAR, forecasting

Pre-built Solutions

One-click, end-to-end solutions for:

  • Demand forecasting
  • Credit rate prediction
  • Fraud detection
  • Computer vision applications

JumpStart is available in AWS GovCloud (US-West and US-East regions).


Model Customization

Fine-tune FMs with your data:

MethodDetails
Fine-tuningAdapt model to specific tasks
Reinforcement Learning (RLHF)Human feedback for alignment
Serverless CustomizationNo infrastructure management (new)
DistillationCreate smaller, faster models

Supported Models for Customization

  • Amazon Nova
  • Llama (4, 3.x)
  • Qwen
  • DeepSeek
  • GPT-OSS compatible models

New: Serverless Model Customization

  • AI agent-guided workflow for reinforcement learning
  • Customize popular models in days (not weeks)
  • No cluster management required

Training at Scale

Amazon SageMaker HyperPod

Distributed training across massive GPU clusters:

FeatureDetails
ScaleHundreds to thousands of GPUs
ResilienceAutomatic failure recovery
OptimizationTraining efficiency tools
GovernanceCentralized access control
FrameworksPyTorch, TensorFlow, JAX

Training Features

FeatureBenefit
Elastic TrainingAuto-scale clusters up/down
Checkpointless TrainingFaster failure recovery
Managed Spot TrainingUp to 90% cost savings
Distributed Data ParallelScale across multiple GPUs
Model ParallelismTrain models too large for single GPU

Inference (Deployment)

70+ instance types with multiple deployment options:

Deployment Options

OptionUse CaseLatencyCost
Real-timeLow-latency predictionsMillisecondsAlways-on
ServerlessVariable traffic, pay-per-requestSecondsAuto-scales
BatchLarge-scale offline predictionsMinutesCost-effective
AsyncLong-running tasks, queue-basedVariableQueue-managed

Important Point: Choose inference type by payload size and latency needs:

  • Real-time → Immediate response needed, small payloads, always-on
  • Serverless → Variable/unpredictable traffic, auto-scales to zero
  • Batch → Large datasets (GBs+), scheduled/offline processing
  • Async → Payloads up to 1 GB, delay acceptable, queue-based

Instance Types

CategoryUse Case
CPUCost-effective for simple models
GPUDeep learning, LLMs
InferentiaCost-optimized inference (AWS chips)
TrainiumTraining-optimized (AWS chips)

Inference Features

  • Multi-model endpoints: Host multiple models on one endpoint
  • Auto-scaling: Scale based on traffic patterns
  • A/B testing: Compare model versions
  • Shadow testing: Test new models against production

Human-in-the-Loop

Amazon SageMaker Ground Truth

Data labeling at scale:

FeatureDetails
Labeling workflowsImages, text, video, 3D
Workforce optionsMechanical Turk, private teams, third-party
Active learningReduce labeling by 70%+
Quality controlConsensus, audit tools

Ground Truth Plus

Fully managed labeling service:

  • AWS-managed expert workforce
  • Custom labeling workflows
  • Enterprise-grade SLAs

Amazon A2I (Augmented AI)

Human review for ML predictions:

Use CaseDetails
Low-confidenceRoute uncertain predictions to humans
Sensitive contentHuman review for moderation
ComplianceAudit trail for decisions
Custom workflowsDefine review criteria

TL;DR

  • JumpStart = 600+ FMs + hundreds of algorithms, one-click deployment
  • Customization = Fine-tuning, RLHF, serverless customization for Llama/Nova/DeepSeek
  • HyperPod = Distributed training across thousands of GPUs with auto-recovery
  • Inference = 4 deployment options (real-time, serverless, batch, async), 70+ instance types
  • Human-in-loop = Ground Truth for labeling, A2I for prediction review
  • Cost savings = Managed Spot Training (up to 90%), Inferentia chips

Resources

SageMaker JumpStart 🔴
Foundation models and ML solutions hub.

SageMaker HyperPod 🔴
Distributed training at scale.

SageMaker Inference 🔴
Model deployment options.

SageMaker Ground Truth 🔴
Data labeling service.