Deep dive into Amazon Bedrock Guardrails — implement responsible AI policies, content filtering, and safety controls.
What are Guardrails?
Guardrails are configurable policies that filter and control what goes into and comes out of foundation models. They help you:
- Block harmful or inappropriate content
- Protect sensitive information (PII)
- Enforce topic restrictions
- Prevent prompt injection attacks
- Reduce hallucinations
Key Point: Guardrails apply to both inputs (user prompts) and outputs (model responses) — bidirectional filtering.
Why Use Guardrails?
| Challenge | Guardrails Solution |
|---|---|
| Users submit inappropriate content | Content filters block before reaching model |
| Model generates harmful responses | Output filtering catches before returning to user |
| PII in prompts or responses | Automatic detection and redaction |
| Users try to jailbreak the model | Prompt attack detection |
| Model needs to avoid certain topics | Denied topics configuration |
| Hallucinations in critical domains | Automated Reasoning checks |
Guardrail Policy Types
1. Content Filters
Block harmful content across categories:
| Category | What It Detects |
|---|---|
| Hate | Discrimination, slurs, hate speech |
| Insults | Demeaning, offensive language |
| Sexual | Explicit or suggestive content |
| Violence | Graphic violence, threats |
| Misconduct | Illegal activities, self-harm |
| Prompt Attacks | Jailbreaks, injection attempts |
Configuration: Set sensitivity level (LOW, MEDIUM, HIGH) per category.
2. Denied Topics
Prevent the model from discussing specific subjects:
| Use Case | Example |
|---|---|
| Competitor information | Block discussion of competitor products |
| Financial advice | Prevent unauthorized investment recommendations |
| Medical diagnosis | Avoid providing medical diagnoses |
| Internal policies | Keep confidential procedures private |
How it works: Define topic with description and sample phrases. Guardrail detects and blocks.
3. Word Filters
Block specific words or phrases:
- Offensive words
- Competitor names
- Internal code names
- Profanity
4. PII Filtering
Detect and handle personally identifiable information:
| PII Type | Options |
|---|---|
| Names, addresses | Detect and redact |
| Phone numbers | Detect and mask |
| Email addresses | Detect and block |
| SSN, credit cards | Detect and redact |
| Custom patterns | Regex-based detection |
Actions: BLOCK (stop request) or ANONYMIZE (mask and continue).
5. Multimodal Toxicity (GA April 2025)
Evaluate images alongside text:
- Detect harmful imagery
- Filter inappropriate visual content
- Combined text + image safety scoring
6. Automated Reasoning (GA August 2025)
Prevent hallucinations using logical verification:
| Feature | Description |
|---|---|
| Mathematical verification | Check factual claims against defined rules |
| Policy documents | Define ground truth for verification |
| Natural language Q&A | Generate test cases from policies (November 2025) |
Use Case: Financial or legal compliance where factual accuracy is critical.
How Guardrails Work
User Input → Guardrail (Input Check) → Model → Guardrail (Output Check) → Response
↓ ↓
Block/Modify Block/Modify
Processing Flow
- Input arrives from user
- Input filtering checks against all configured policies
- If violation: Block with configured message OR Modify (redact PII)
- If clean: Forward to foundation model
- Model generates response
- Output filtering checks response against policies
- If violation: Block or Modify response
- Return safe response to user
Applying Guardrails
Guardrails can be attached to:
| Resource | How |
|---|---|
| Model invocations | Specify guardrail ID in API call |
| Agents | Attach during agent creation |
| Knowledge Bases | Apply to RAG responses |
API Example (Conceptual)
InvokeModel(
modelId: "anthropic.claude-v2",
guardrailIdentifier: "my-guardrail-id",
guardrailVersion: "1",
prompt: "..."
)
Pricing
Guardrails are priced per text unit analyzed:
| Policy Type | Pricing Basis |
|---|---|
| Content filters | Per 1,000 text units |
| Denied topics | Per 1,000 text units |
| Word filters | Per 1,000 text units |
| PII detection | Per 1,000 text units |
| Multimodal | Per image analyzed |
| Automated Reasoning | Per text unit + policy evaluation |
What’s a Text Unit?
- Approximately 1,000 characters
- Both input and output are counted separately
- Longer prompts/responses = higher cost
Price Reduction (December 2024)
AWS announced 85% price reduction for:
- Content filters
- Denied topics
Cost Considerations
| Factor | Impact |
|---|---|
| Number of policies enabled | More policies = higher cost |
| Input + Output length | Longer content = more text units |
| Call volume | High traffic apps incur significant costs |
| Multimodal | Image evaluation adds per-image cost |
Cost Tip: Enable only the policies you need. Content filters on HIGH sensitivity check more aggressively but cost the same.
Guardrails vs Model-Level Safety
| Aspect | Model’s Built-in Safety | Bedrock Guardrails |
|---|---|---|
| Control | Provider-defined | You define policies |
| Customization | None | Full customization |
| Denied topics | Generic | Your specific topics |
| PII handling | Limited | Configurable actions |
| Auditability | Limited | CloudWatch logging |
| Cross-model | Varies by model | Same policies, any model |
Best Practice: Use both — model safety as baseline, Guardrails for your specific requirements.
Monitoring & Troubleshooting
CloudWatch Integration
| Metric | What It Shows |
|---|---|
| GuardrailBlocked | Number of blocked requests |
| GuardrailIntervened | Requests modified (PII redacted) |
| Latency | Processing time added by Guardrails |
Debugging Blocked Requests
- Enable detailed logging
- Review which policy triggered
- Adjust sensitivity or policy definition
- Test with sample inputs
Best Practices
| Practice | Reason |
|---|---|
| Start with default sensitivities | Tune based on actual traffic |
| Test with realistic inputs | Ensure legitimate requests aren’t blocked |
| Monitor blocked request rate | High rate may indicate over-filtering |
| Use descriptive denied topics | Clear definitions improve accuracy |
| Enable PII filtering for user-facing apps | Protect both users and company |
| Log everything | Audit trail for compliance |
TL;DR
- Guardrails = configurable filters for inputs AND outputs
- Policy types: Content filters, denied topics, word filters, PII, prompt attacks, multimodal, automated reasoning
- Priced per 1,000 text units — 85% cheaper since December 2024
- Attach to: Models, Agents, Knowledge Bases
- Key benefit: Consistent safety policies across any model
Resources
Bedrock Guardrails
Official documentation for configuring guardrails.Guardrails Pricing
Current pricing for guardrail policies.