Deep dive into fine-tuning foundation models in Amazon Bedrock — why, what’s possible, limitations, and pricing.


What is Fine-Tuning?

Fine-tuning adapts a pre-trained foundation model to your specific use case by training it on your own data. The result is a custom model version that performs better on your tasks while retaining the base model’s general capabilities.


Why Fine-Tune?

ScenarioSolution
Model doesn’t know your domain terminologyFine-tune on domain documents
Inconsistent output format/styleFine-tune on examples with desired format
Task-specific performance neededFine-tune on labeled examples
Prompt engineering isn’t enoughFine-tuning provides deeper customization

Fine-Tuning vs Alternatives

ApproachEffortCustomizationWhen to Use
Prompt EngineeringLowSurface-levelTry first — often sufficient
RAG (Knowledge Bases)MediumAdds knowledgeModel needs access to your data
Fine-TuningHighDeep behavior changeNeed consistent style/format/domain expertise
Continued Pre-TrainingHighestDomain adaptationModel needs to “speak” your industry language

Types of Customization in Bedrock

1. Continued Pre-Training

  • Train on unlabeled domain-specific data
  • Model learns domain vocabulary and patterns
  • Example: Training on medical literature so model understands clinical terms

2. Fine-Tuning (Supervised)

  • Train on labeled prompt-completion pairs
  • Model learns specific task behavior
  • Example: Training on customer support tickets to generate consistent responses

Supported Models for Fine-Tuning

⚠️ Not all Bedrock models support fine-tuning. Always check the Bedrock Model Support page.

ProviderModelFine-Tuning Support
AmazonTitan Text✅ Supported
AmazonTitan Image Generator G1✅ Supported (style/brand adaptation)
AmazonTitan Embeddings❌ Not supported
MetaLlama 2✅ Supported
MetaLlama 3.1 (8B, 70B)✅ Supported (128K context)
MetaLlama 3.2 (1B, 3B, 11B, 90B)✅ Supported (multimodal for 11B/90B)
AnthropicClaude❌ Not supported via Bedrock
MistralMistral models✅ Some supported
CohereCommand✅ Supported

Multimodal Fine-Tuning

Llama 3.2 11B and 90B are multimodal — fine-tune for:

  • Visual question answering
  • Image captioning
  • Document analysis with images

Fine-Tuning Process

1. Prepare Data → 2. Upload to S3 → 3. Create Job → 4. Training → 5. Deploy → 6. Inference

Step-by-Step

StepDetails
1. Prepare training dataJSONL format with prompt-completion pairs
2. Upload to S3Training data in your S3 bucket
3. Configure jobSelect base model, hyperparameters, output location
4. Training runsAWS manages compute, typically hours to complete
5. Custom model createdStored in your account
6. Deploy and useInvoke like any Bedrock model

Important Point: Fine-tuned/custom models require Provisioned Throughput for deployment. You cannot use On-Demand mode with custom models — you must purchase reserved capacity to test and deploy them.

Training Data Format

{"prompt": "Summarize this ticket:", "completion": "Customer requests refund for..."}
{"prompt": "Summarize this ticket:", "completion": "User reports login issue..."}

Limitations & Constraints

LimitationDetails
Model availabilityOnly specific models support fine-tuning
Minimum dataTypically need hundreds to thousands of examples
Training timeHours to complete (varies by data size, model)
Region availabilityFine-tuning may not be available in all regions
No real-time updatesCan’t update model incrementally — retrain fully
Storage costsCustom models incur storage fees
Higher inference costFine-tuned models cost more per token than base models

Data Requirements

ConsiderationRecommendation
Quality over quantityClean, consistent examples matter more than volume
Format consistencyUse consistent prompt/completion structure
DiversityCover edge cases and variations
Validation setHold out data for evaluation

Pricing

Fine-tuning has three cost components:

ComponentPricing Basis
TrainingPer token processed during training
StoragePer GB-month for custom model storage
InferencePer token (higher than base model)

Cost Considerations

FactorImpact
Training data sizeMore data = higher training cost
Number of epochsMore passes = higher cost, potentially better results
Model sizeLarger models cost more to train
Inference volumeConsider if cost premium is worth the improvement

Cost Tip: Start with prompt engineering and RAG. Only fine-tune when those approaches aren’t sufficient — fine-tuning is the most expensive customization option.


When to Fine-Tune (and When Not To)

✅ Good Use Cases

  • Consistent output format across all responses
  • Domain-specific terminology (legal, medical, technical)
  • Brand voice and style consistency
  • Task-specific optimization (classification, extraction)
  • Multimodal tasks with your image data

❌ When to Avoid

  • Just need to add knowledge → Use RAG instead
  • Simple formatting needs → Use system prompts
  • Small dataset (<100 examples) → Likely won’t help
  • Rapidly changing information → RAG is more flexible
  • Budget constraints → Explore cheaper options first

TL;DR

  • Fine-tuning = train a base model on your data for better task-specific performance
  • Supported models: Titan, Llama 2, Llama 3.1 (8B/70B), Llama 3.2 (1B/3B/11B/90B), Cohere Command, some Mistral
  • Not supported: Claude (via Bedrock), Llama 4 (not yet confirmed)
  • Process: Prepare JSONL data → Upload to S3 → Create training job → Deploy custom model
  • Costs: Training (per token) + Storage (per GB) + Inference (premium over base)
  • Try first: Prompt engineering → RAG → Fine-tuning (in that order)

Resources

Bedrock Model Customization
Official documentation for fine-tuning and continued pre-training.

Supported Models for Fine-Tuning
Check which models support customization.