Guardrails

Deep dive into Amazon Bedrock Guardrails — implement responsible AI policies, content filtering, and safety controls.

What are Guardrails?

Guardrails are configurable policies that filter and control what goes into and comes out of foundation models. They help you:

Block harmful or inappropriate content
Protect sensitive information (PII)
Enforce topic restrictions
Prevent prompt injection attacks
Reduce hallucinations

Key Point: Guardrails apply to both inputs (user prompts) and outputs (model responses) — bidirectional filtering.

Why Use Guardrails?

Challenge	Guardrails Solution
Users submit inappropriate content	Content filters block before reaching model
Model generates harmful responses	Output filtering catches before returning to user
PII in prompts or responses	Automatic detection and redaction
Users try to jailbreak the model	Prompt attack detection
Model needs to avoid certain topics	Denied topics configuration
Hallucinations in critical domains	Automated Reasoning checks

Guardrail Policy Types

1. Content Filters

Block harmful content across categories:

Category	What It Detects
Hate	Discrimination, slurs, hate speech
Insults	Demeaning, offensive language
Sexual	Explicit or suggestive content
Violence	Graphic violence, threats
Misconduct	Illegal activities, self-harm
Prompt Attacks	Jailbreaks, injection attempts

Configuration: Set sensitivity level (LOW, MEDIUM, HIGH) per category.

2. Denied Topics

Prevent the model from discussing specific subjects:

Use Case	Example
Competitor information	Block discussion of competitor products
Financial advice	Prevent unauthorized investment recommendations
Medical diagnosis	Avoid providing medical diagnoses
Internal policies	Keep confidential procedures private

How it works: Define topic with description and sample phrases. Guardrail detects and blocks.

3. Word Filters

Block specific words or phrases:

Offensive words
Competitor names
Internal code names
Profanity

4. PII Filtering

Detect and handle personally identifiable information:

PII Type	Options
Names, addresses	Detect and redact
Phone numbers	Detect and mask
Email addresses	Detect and block
SSN, credit cards	Detect and redact
Custom patterns	Regex-based detection

Actions: BLOCK (stop request) or ANONYMIZE (mask and continue).

5. Multimodal Toxicity (GA April 2025)

Evaluate images alongside text:

Detect harmful imagery
Filter inappropriate visual content
Combined text + image safety scoring

6. Automated Reasoning (GA August 2025)

Prevent hallucinations using logical verification:

Feature	Description
Mathematical verification	Check factual claims against defined rules
Policy documents	Define ground truth for verification
Natural language Q&A	Generate test cases from policies (November 2025)

Use Case: Financial or legal compliance where factual accuracy is critical.

How Guardrails Work

User Input → Guardrail (Input Check) → Model → Guardrail (Output Check) → Response
                    ↓                              ↓
              Block/Modify                   Block/Modify

Processing Flow

Input arrives from user
Input filtering checks against all configured policies
If violation: Block with configured message OR Modify (redact PII)
If clean: Forward to foundation model
Model generates response
Output filtering checks response against policies
If violation: Block or Modify response
Return safe response to user

Applying Guardrails

Guardrails can be attached to:

Resource	How
Model invocations	Specify guardrail ID in API call
Agents	Attach during agent creation
Knowledge Bases	Apply to RAG responses

API Example (Conceptual)

InvokeModel(
    modelId: "anthropic.claude-v2",
    guardrailIdentifier: "my-guardrail-id",
    guardrailVersion: "1",
    prompt: "..."
)

Pricing

Guardrails are priced per text unit analyzed:

Policy Type	Pricing Basis
Content filters	Per 1,000 text units
Denied topics	Per 1,000 text units
Word filters	Per 1,000 text units
PII detection	Per 1,000 text units
Multimodal	Per image analyzed
Automated Reasoning	Per text unit + policy evaluation

What’s a Text Unit?

Approximately 1,000 characters
Both input and output are counted separately
Longer prompts/responses = higher cost

Price Reduction (December 2024)

AWS announced 85% price reduction for:

Content filters
Denied topics

Cost Considerations

Factor	Impact
Number of policies enabled	More policies = higher cost
Input + Output length	Longer content = more text units
Call volume	High traffic apps incur significant costs
Multimodal	Image evaluation adds per-image cost

Cost Tip: Enable only the policies you need. Content filters on HIGH sensitivity check more aggressively but cost the same.

Guardrails vs Model-Level Safety

Aspect	Model’s Built-in Safety	Bedrock Guardrails
Control	Provider-defined	You define policies
Customization	None	Full customization
Denied topics	Generic	Your specific topics
PII handling	Limited	Configurable actions
Auditability	Limited	CloudWatch logging
Cross-model	Varies by model	Same policies, any model

Best Practice: Use both — model safety as baseline, Guardrails for your specific requirements.

Monitoring & Troubleshooting

CloudWatch Integration

Metric	What It Shows
GuardrailBlocked	Number of blocked requests
GuardrailIntervened	Requests modified (PII redacted)
Latency	Processing time added by Guardrails

Debugging Blocked Requests

Enable detailed logging
Review which policy triggered
Adjust sensitivity or policy definition
Test with sample inputs

Best Practices

Practice	Reason
Start with default sensitivities	Tune based on actual traffic
Test with realistic inputs	Ensure legitimate requests aren’t blocked
Monitor blocked request rate	High rate may indicate over-filtering
Use descriptive denied topics	Clear definitions improve accuracy
Enable PII filtering for user-facing apps	Protect both users and company
Log everything	Audit trail for compliance

TL;DR

Guardrails = configurable filters for inputs AND outputs
Policy types: Content filters, denied topics, word filters, PII, prompt attacks, multimodal, automated reasoning
Priced per 1,000 text units — 85% cheaper since December 2024
Attach to: Models, Agents, Knowledge Bases
Key benefit: Consistent safety policies across any model

Resources

Bedrock Guardrails
Official documentation for configuring guardrails.

Guardrails Pricing
Current pricing for guardrail policies.

Lalit's Cloud & DevOps notes