Deep dive into Amazon Bedrock’s core capabilities — Knowledge Bases, Agents, Guardrails, Fine-tuning, and Model Evaluation.
Model Inference
Basic usage — send prompts, get responses:
| Mode | Description | Best For |
|---|---|---|
| On-demand | Pay per token, no commitment | Variable workloads, experimentation |
| Provisioned Throughput | Reserved capacity, consistent performance | High-volume production workloads |
| Batch Inference | Process large datasets offline | Bulk processing, lower cost |
Knowledge Bases (RAG)
Retrieval-Augmented Generation — ground model responses in your data.
How RAG Works
User Query → Retrieve relevant chunks from vector store →
Include in prompt → Model generates grounded response
Supported Data Sources
| Source Type | Examples |
|---|---|
| Storage | Amazon S3 |
| Web | Web crawlers |
| Enterprise | Confluence, SharePoint, Salesforce |
Supported Vector Stores
| Vector Store | Notes |
|---|---|
| Amazon OpenSearch Serverless | Fully managed, default option |
| Amazon OpenSearch Managed Cluster | Self-managed option (March 2025) |
| Amazon Aurora PostgreSQL | pgvector extension |
| Pinecone | Third-party |
| MongoDB Atlas | Third-party |
| Redis Enterprise | Third-party |
Advanced Knowledge Base Features
| Feature | Description |
|---|---|
| Automatic chunking | Parses and splits documents automatically |
| Multimodal | Process text + images in documents |
| GraphRAG | Combine RAG with graph databases (Neptune) — GA March 2025 |
| Structured data | Query data warehouses with natural language (text-to-SQL) |
| Rerank API | Re-score retrieved chunks for better relevance |
| Custom connectors | Direct ingestion without full sync |
| Data Automation | Process video content (AVI, MKV, WEBM) for knowledge bases |
Important Point: Knowledge Bases handle the entire RAG pipeline — chunking, embedding, storage, retrieval. You just provide the data.
Important Point: For RAG vector stores, OpenSearch Serverless is the default and recommended choice — it’s purpose-built for similarity search and vector embeddings. Don’t confuse with DynamoDB (key-value), Aurora (relational), or DocumentDB (document store).
Agents
Build autonomous, multi-step workflows that can reason and take actions.
Agent Components
| Component | Description |
|---|---|
| Foundation Model | The LLM that powers reasoning |
| Instructions | System prompt defining agent behavior |
| Action Groups | Lambda functions or APIs the agent can call |
| Knowledge Bases | Data the agent can query |
| Guardrails | Safety policies (optional) |
Agent Capabilities
| Capability | Description |
|---|---|
| Multi-step reasoning | Agent plans and executes complex tasks |
| Tool use | Calls external APIs and Lambda functions |
| Memory | Maintains conversation context |
| Multi-agent collaboration | Multiple specialized agents work together |
Multi-Agent Collaboration (GA March 2025)
Enterprise-grade agent deployment:
- Multiple specialized agents work together on complex tasks
- Memory management across agent sessions
- Identity controls (IAM integration)
- Tool integration framework
- Observability and monitoring
Use Case Example: An agent that retrieves customer data (knowledge base), checks order status (API), and schedules a callback (action) — all from a single user request.
Guardrails
Implement responsible AI policies — filter both inputs and outputs.
Policy Types
| Policy | What It Does |
|---|---|
| Content filters | Block hate, violence, sexual content, insults, misconduct |
| Denied topics | Define topics the model should refuse (e.g., competitor info) |
| Word filters | Block specific offensive words |
| PII filtering | Detect and redact personally identifiable information |
| Prompt attacks | Detect jailbreak/injection attempts |
| Multimodal toxicity | Filter harmful image + text content |
| Automated Reasoning | Prevent hallucinations using logical verification |
Key Points
- Guardrails apply to both input and output
- Can be attached to models, agents, and knowledge bases
- Automated Reasoning checks use mathematical verification to prevent factual errors (GA August 2025)
- Natural language test Q&A generation for Automated Reasoning (November 2025)
- Multimodal toxicity can evaluate images (GA April 2025)
- 85% price reduction announced December 2024
Important Point: Know that Guardrails can filter PII, block topics, and prevent prompt injection attacks.
Fine-Tuning & Customization
Customize models for your specific use case:
| Method | Description | When to Use |
|---|---|---|
| Prompt engineering | Craft system prompts for consistent behavior | Quick, no training needed |
| Continued Pre-training | Train on domain-specific unlabeled data | Domain adaptation (legal, medical) |
| Fine-tuning | Train on labeled examples (prompt-response pairs) | Task-specific optimization |
Fine-Tuning Process
- Prepare training data (JSONL format)
- Upload to S3
- Create fine-tuning job in Bedrock
- Deploy custom model version
- Use via same API
Note: Not all models support fine-tuning. Check model documentation.
Model Evaluation
Compare model outputs before deployment:
| Evaluation Type | Description |
|---|---|
| Automatic metrics | Accuracy, toxicity, robustness scores |
| Human evaluation | Set up human review workflows |
| LLM-as-a-judge | Use another model to evaluate outputs |
| Custom criteria | Define your own evaluation metrics |
| RAG evaluation | Evaluate retrieval + generation quality |
Use case: Compare Claude vs Llama on your specific prompts before choosing a production model.
TL;DR
| Feature | One-liner |
|---|---|
| Knowledge Bases | Managed RAG — connect your data to models |
| Agents | Multi-step workflows with tool use |
| Guardrails | Content filtering, PII, anti-hallucination |
| Fine-tuning | Customize models on your data |
| Evaluation | Compare models before deployment |
Resources
Bedrock Knowledge Bases
Documentation for RAG implementation.Bedrock Agents
Building autonomous workflows.Bedrock Guardrails
Implementing responsible AI policies.