Pricing

Understanding Amazon Bedrock pricing — on-demand, provisioned throughput, and additional feature costs.

Pricing Model Overview

Bedrock pricing varies by model and usage pattern:

Pricing Type	How It Works	Best For
On-demand	Pay per input/output token	Variable workloads, experimentation
Provisioned Throughput	Hourly rate for reserved capacity	High-volume, consistent workloads
Batch Inference	Discounted offline processing	Bulk processing jobs

On-Demand Pricing

Pay only for what you use — charged per 1,000 tokens (input and output separately).

Price Varies by Model

Model Tier	Relative Cost	Examples
Budget	$	Titan Lite, Mistral Small
Mid-tier	$$	Titan Express, Llama models
Premium	$$$	Claude 4.x, Mistral Large
Specialized	$$$$	Image generation (per image)

Important Point: Smaller/faster models are cheaper. Choose based on task requirements, not just capability.

What Counts as a Token?

~4 characters = 1 token (English text)
Input tokens: Your prompt
Output tokens: Model’s response
System prompts count as input tokens

Provisioned Throughput

Reserve capacity for consistent performance and predictable costs.

Aspect	Details
Commitment	1-month or 6-month terms (discounts)
Unit	Model Units (throughput capacity)
Billing	Hourly rate while provisioned
Benefit	Guaranteed throughput, no throttling

When to Use Provisioned Throughput

Scenario	Recommendation
Unpredictable, low volume	On-demand
Consistent high volume	Provisioned (cost savings)
Latency-sensitive production	Provisioned (guaranteed capacity)
Experimentation/dev	On-demand

Batch Inference

Process large datasets at reduced cost:

Aspect	Details
Discount	~50% cheaper than on-demand
Latency	Not real-time (hours to complete)
Input	S3 bucket with prompts
Output	S3 bucket with responses

Use case: Summarizing thousands of documents, bulk classification, dataset enrichment.

Knowledge Bases Pricing

Component	Pricing Basis
Storage	Vector store costs (OpenSearch, Aurora, etc.)
Embeddings	Token cost for embedding generation
Retrieval	Per query cost
Model inference	Standard model pricing for responses

Guardrails Pricing

Policy Type	Pricing
Content filters	Per 1,000 text units analyzed
Denied topics	Per 1,000 text units
PII detection	Per 1,000 text units
Multimodal	Per image analyzed

Note: 85% price reduction for content filters and denied topics effective December 2024.

Fine-Tuning Pricing

Component	Pricing
Training	Per token processed during training
Storage	Per GB-month for custom model storage
Inference	Higher per-token cost than base model

Cost Optimization Tips

Tip	Impact
Choose smallest model that works	Major savings
Use batch for non-real-time workloads	~50% savings
Cache common responses	Reduce redundant calls
Optimize prompts (fewer tokens)	Lower input costs
Consider provisioned at scale	Predictable, often cheaper
Monitor with CloudWatch	Identify waste

TL;DR

On-demand: Pay per token, best for variable/low volume
Provisioned: Reserved capacity, best for high-volume production
Batch: ~50% cheaper, for offline bulk processing
Knowledge Bases: Vector store + embedding + retrieval + inference costs
Guardrails: Per text unit (got 85% cheaper in Dec 2024)
Smaller models = significant cost savings

Resources

Bedrock Pricing
Official pricing page with current rates per model.

AWS Pricing Calculator
Estimate costs for your workload.

Lalit's Cloud & DevOps notes