Fine-Tuning

Deep dive into fine-tuning foundation models in Amazon Bedrock — why, what’s possible, limitations, and pricing.

What is Fine-Tuning?

Fine-tuning adapts a pre-trained foundation model to your specific use case by training it on your own data. The result is a custom model version that performs better on your tasks while retaining the base model’s general capabilities.

Why Fine-Tune?

Scenario	Solution
Model doesn’t know your domain terminology	Fine-tune on domain documents
Inconsistent output format/style	Fine-tune on examples with desired format
Task-specific performance needed	Fine-tune on labeled examples
Prompt engineering isn’t enough	Fine-tuning provides deeper customization

Fine-Tuning vs Alternatives

Approach	Effort	Customization	When to Use
Prompt Engineering	Low	Surface-level	Try first — often sufficient
RAG (Knowledge Bases)	Medium	Adds knowledge	Model needs access to your data
Fine-Tuning	High	Deep behavior change	Need consistent style/format/domain expertise
Continued Pre-Training	Highest	Domain adaptation	Model needs to “speak” your industry language

Types of Customization in Bedrock

1. Continued Pre-Training

Train on unlabeled domain-specific data
Model learns domain vocabulary and patterns
Example: Training on medical literature so model understands clinical terms

2. Fine-Tuning (Supervised)

Train on labeled prompt-completion pairs
Model learns specific task behavior
Example: Training on customer support tickets to generate consistent responses

Supported Models for Fine-Tuning

⚠️ Not all Bedrock models support fine-tuning. Always check the Bedrock Model Support page.

Provider	Model	Fine-Tuning Support
Amazon	Titan Text	✅ Supported
Amazon	Titan Image Generator G1	✅ Supported (style/brand adaptation)
Amazon	Titan Embeddings	❌ Not supported
Meta	Llama 2	✅ Supported
Meta	Llama 3.1 (8B, 70B)	✅ Supported (128K context)
Meta	Llama 3.2 (1B, 3B, 11B, 90B)	✅ Supported (multimodal for 11B/90B)
Anthropic	Claude	❌ Not supported via Bedrock
Mistral	Mistral models	✅ Some supported
Cohere	Command	✅ Supported

Multimodal Fine-Tuning

Llama 3.2 11B and 90B are multimodal — fine-tune for:

Visual question answering
Image captioning
Document analysis with images

Fine-Tuning Process

1. Prepare Data → 2. Upload to S3 → 3. Create Job → 4. Training → 5. Deploy → 6. Inference

Step-by-Step

Step	Details
1. Prepare training data	JSONL format with prompt-completion pairs
2. Upload to S3	Training data in your S3 bucket
3. Configure job	Select base model, hyperparameters, output location
4. Training runs	AWS manages compute, typically hours to complete
5. Custom model created	Stored in your account
6. Deploy and use	Invoke like any Bedrock model

Important Point: Fine-tuned/custom models require Provisioned Throughput for deployment. You cannot use On-Demand mode with custom models — you must purchase reserved capacity to test and deploy them.

Training Data Format

{"prompt": "Summarize this ticket:", "completion": "Customer requests refund for..."}
{"prompt": "Summarize this ticket:", "completion": "User reports login issue..."}

Limitations & Constraints

Limitation	Details
Model availability	Only specific models support fine-tuning
Minimum data	Typically need hundreds to thousands of examples
Training time	Hours to complete (varies by data size, model)
Region availability	Fine-tuning may not be available in all regions
No real-time updates	Can’t update model incrementally — retrain fully
Storage costs	Custom models incur storage fees
Higher inference cost	Fine-tuned models cost more per token than base models

Data Requirements

Consideration	Recommendation
Quality over quantity	Clean, consistent examples matter more than volume
Format consistency	Use consistent prompt/completion structure
Diversity	Cover edge cases and variations
Validation set	Hold out data for evaluation

Pricing

Fine-tuning has three cost components:

Component	Pricing Basis
Training	Per token processed during training
Storage	Per GB-month for custom model storage
Inference	Per token (higher than base model)

Cost Considerations

Factor	Impact
Training data size	More data = higher training cost
Number of epochs	More passes = higher cost, potentially better results
Model size	Larger models cost more to train
Inference volume	Consider if cost premium is worth the improvement

Cost Tip: Start with prompt engineering and RAG. Only fine-tune when those approaches aren’t sufficient — fine-tuning is the most expensive customization option.

When to Fine-Tune (and When Not To)

✅ Good Use Cases

Consistent output format across all responses
Domain-specific terminology (legal, medical, technical)
Brand voice and style consistency
Task-specific optimization (classification, extraction)
Multimodal tasks with your image data

❌ When to Avoid

Just need to add knowledge → Use RAG instead
Simple formatting needs → Use system prompts
Small dataset (<100 examples) → Likely won’t help
Rapidly changing information → RAG is more flexible
Budget constraints → Explore cheaper options first

TL;DR

Fine-tuning = train a base model on your data for better task-specific performance
Supported models: Titan, Llama 2, Llama 3.1 (8B/70B), Llama 3.2 (1B/3B/11B/90B), Cohere Command, some Mistral
Not supported: Claude (via Bedrock), Llama 4 (not yet confirmed)
Process: Prepare JSONL data → Upload to S3 → Create training job → Deploy custom model
Costs: Training (per token) + Storage (per GB) + Inference (premium over base)
Try first: Prompt engineering → RAG → Fine-tuning (in that order)

Resources

Bedrock Model Customization
Official documentation for fine-tuning and continued pre-training.

Supported Models for Fine-Tuning
Check which models support customization.

Lalit's Cloud & DevOps notes