Understanding Amazon Bedrock pricing — on-demand, provisioned throughput, and additional feature costs.
Pricing Model Overview
Bedrock pricing varies by model and usage pattern:
| Pricing Type | How It Works | Best For |
|---|---|---|
| On-demand | Pay per input/output token | Variable workloads, experimentation |
| Provisioned Throughput | Hourly rate for reserved capacity | High-volume, consistent workloads |
| Batch Inference | Discounted offline processing | Bulk processing jobs |
On-Demand Pricing
Pay only for what you use — charged per 1,000 tokens (input and output separately).
Price Varies by Model
| Model Tier | Relative Cost | Examples |
|---|---|---|
| Budget | $ | Titan Lite, Mistral Small |
| Mid-tier | $$ | Titan Express, Llama models |
| Premium | $$$ | Claude 4.x, Mistral Large |
| Specialized | $$$$ | Image generation (per image) |
Important Point: Smaller/faster models are cheaper. Choose based on task requirements, not just capability.
What Counts as a Token?
- ~4 characters = 1 token (English text)
- Input tokens: Your prompt
- Output tokens: Model’s response
- System prompts count as input tokens
Provisioned Throughput
Reserve capacity for consistent performance and predictable costs.
| Aspect | Details |
|---|---|
| Commitment | 1-month or 6-month terms (discounts) |
| Unit | Model Units (throughput capacity) |
| Billing | Hourly rate while provisioned |
| Benefit | Guaranteed throughput, no throttling |
When to Use Provisioned Throughput
| Scenario | Recommendation |
|---|---|
| Unpredictable, low volume | On-demand |
| Consistent high volume | Provisioned (cost savings) |
| Latency-sensitive production | Provisioned (guaranteed capacity) |
| Experimentation/dev | On-demand |
Batch Inference
Process large datasets at reduced cost:
| Aspect | Details |
|---|---|
| Discount | ~50% cheaper than on-demand |
| Latency | Not real-time (hours to complete) |
| Input | S3 bucket with prompts |
| Output | S3 bucket with responses |
Use case: Summarizing thousands of documents, bulk classification, dataset enrichment.
Knowledge Bases Pricing
| Component | Pricing Basis |
|---|---|
| Storage | Vector store costs (OpenSearch, Aurora, etc.) |
| Embeddings | Token cost for embedding generation |
| Retrieval | Per query cost |
| Model inference | Standard model pricing for responses |
Guardrails Pricing
| Policy Type | Pricing |
|---|---|
| Content filters | Per 1,000 text units analyzed |
| Denied topics | Per 1,000 text units |
| PII detection | Per 1,000 text units |
| Multimodal | Per image analyzed |
Note: 85% price reduction for content filters and denied topics effective December 2024.
Fine-Tuning Pricing
| Component | Pricing |
|---|---|
| Training | Per token processed during training |
| Storage | Per GB-month for custom model storage |
| Inference | Higher per-token cost than base model |
Cost Optimization Tips
| Tip | Impact |
|---|---|
| Choose smallest model that works | Major savings |
| Use batch for non-real-time workloads | ~50% savings |
| Cache common responses | Reduce redundant calls |
| Optimize prompts (fewer tokens) | Lower input costs |
| Consider provisioned at scale | Predictable, often cheaper |
| Monitor with CloudWatch | Identify waste |
TL;DR
- On-demand: Pay per token, best for variable/low volume
- Provisioned: Reserved capacity, best for high-volume production
- Batch: ~50% cheaper, for offline bulk processing
- Knowledge Bases: Vector store + embedding + retrieval + inference costs
- Guardrails: Per text unit (got 85% cheaper in Dec 2024)
- Smaller models = significant cost savings
Resources
Bedrock Pricing
Official pricing page with current rates per model.AWS Pricing Calculator
Estimate costs for your workload.