Understanding Amazon Bedrock pricing — on-demand, provisioned throughput, and additional feature costs.


Pricing Model Overview

Bedrock pricing varies by model and usage pattern:

Pricing TypeHow It WorksBest For
On-demandPay per input/output tokenVariable workloads, experimentation
Provisioned ThroughputHourly rate for reserved capacityHigh-volume, consistent workloads
Batch InferenceDiscounted offline processingBulk processing jobs

On-Demand Pricing

Pay only for what you use — charged per 1,000 tokens (input and output separately).

Price Varies by Model

Model TierRelative CostExamples
Budget$Titan Lite, Mistral Small
Mid-tier$$Titan Express, Llama models
Premium$$$Claude 4.x, Mistral Large
Specialized$$$$Image generation (per image)

Important Point: Smaller/faster models are cheaper. Choose based on task requirements, not just capability.

What Counts as a Token?

  • ~4 characters = 1 token (English text)
  • Input tokens: Your prompt
  • Output tokens: Model’s response
  • System prompts count as input tokens

Provisioned Throughput

Reserve capacity for consistent performance and predictable costs.

AspectDetails
Commitment1-month or 6-month terms (discounts)
UnitModel Units (throughput capacity)
BillingHourly rate while provisioned
BenefitGuaranteed throughput, no throttling

When to Use Provisioned Throughput

ScenarioRecommendation
Unpredictable, low volumeOn-demand
Consistent high volumeProvisioned (cost savings)
Latency-sensitive productionProvisioned (guaranteed capacity)
Experimentation/devOn-demand

Batch Inference

Process large datasets at reduced cost:

AspectDetails
Discount~50% cheaper than on-demand
LatencyNot real-time (hours to complete)
InputS3 bucket with prompts
OutputS3 bucket with responses

Use case: Summarizing thousands of documents, bulk classification, dataset enrichment.


Knowledge Bases Pricing

ComponentPricing Basis
StorageVector store costs (OpenSearch, Aurora, etc.)
EmbeddingsToken cost for embedding generation
RetrievalPer query cost
Model inferenceStandard model pricing for responses

Guardrails Pricing

Policy TypePricing
Content filtersPer 1,000 text units analyzed
Denied topicsPer 1,000 text units
PII detectionPer 1,000 text units
MultimodalPer image analyzed

Note: 85% price reduction for content filters and denied topics effective December 2024.


Fine-Tuning Pricing

ComponentPricing
TrainingPer token processed during training
StoragePer GB-month for custom model storage
InferenceHigher per-token cost than base model

Cost Optimization Tips

TipImpact
Choose smallest model that worksMajor savings
Use batch for non-real-time workloads~50% savings
Cache common responsesReduce redundant calls
Optimize prompts (fewer tokens)Lower input costs
Consider provisioned at scalePredictable, often cheaper
Monitor with CloudWatchIdentify waste

TL;DR

  • On-demand: Pay per token, best for variable/low volume
  • Provisioned: Reserved capacity, best for high-volume production
  • Batch: ~50% cheaper, for offline bulk processing
  • Knowledge Bases: Vector store + embedding + retrieval + inference costs
  • Guardrails: Per text unit (got 85% cheaper in Dec 2024)
  • Smaller models = significant cost savings

Resources

Bedrock Pricing
Official pricing page with current rates per model.

AWS Pricing Calculator
Estimate costs for your workload.