Deep dive into Amazon Bedrock Guardrails — implement responsible AI policies, content filtering, and safety controls.


What are Guardrails?

Guardrails are configurable policies that filter and control what goes into and comes out of foundation models. They help you:

  • Block harmful or inappropriate content
  • Protect sensitive information (PII)
  • Enforce topic restrictions
  • Prevent prompt injection attacks
  • Reduce hallucinations

Key Point: Guardrails apply to both inputs (user prompts) and outputs (model responses) — bidirectional filtering.


Why Use Guardrails?

ChallengeGuardrails Solution
Users submit inappropriate contentContent filters block before reaching model
Model generates harmful responsesOutput filtering catches before returning to user
PII in prompts or responsesAutomatic detection and redaction
Users try to jailbreak the modelPrompt attack detection
Model needs to avoid certain topicsDenied topics configuration
Hallucinations in critical domainsAutomated Reasoning checks

Guardrail Policy Types

1. Content Filters

Block harmful content across categories:

CategoryWhat It Detects
HateDiscrimination, slurs, hate speech
InsultsDemeaning, offensive language
SexualExplicit or suggestive content
ViolenceGraphic violence, threats
MisconductIllegal activities, self-harm
Prompt AttacksJailbreaks, injection attempts

Configuration: Set sensitivity level (LOW, MEDIUM, HIGH) per category.

2. Denied Topics

Prevent the model from discussing specific subjects:

Use CaseExample
Competitor informationBlock discussion of competitor products
Financial advicePrevent unauthorized investment recommendations
Medical diagnosisAvoid providing medical diagnoses
Internal policiesKeep confidential procedures private

How it works: Define topic with description and sample phrases. Guardrail detects and blocks.

3. Word Filters

Block specific words or phrases:

  • Offensive words
  • Competitor names
  • Internal code names
  • Profanity

4. PII Filtering

Detect and handle personally identifiable information:

PII TypeOptions
Names, addressesDetect and redact
Phone numbersDetect and mask
Email addressesDetect and block
SSN, credit cardsDetect and redact
Custom patternsRegex-based detection

Actions: BLOCK (stop request) or ANONYMIZE (mask and continue).

5. Multimodal Toxicity (GA April 2025)

Evaluate images alongside text:

  • Detect harmful imagery
  • Filter inappropriate visual content
  • Combined text + image safety scoring

6. Automated Reasoning (GA August 2025)

Prevent hallucinations using logical verification:

FeatureDescription
Mathematical verificationCheck factual claims against defined rules
Policy documentsDefine ground truth for verification
Natural language Q&AGenerate test cases from policies (November 2025)

Use Case: Financial or legal compliance where factual accuracy is critical.


How Guardrails Work

User Input → Guardrail (Input Check) → Model → Guardrail (Output Check) → Response
                    ↓                              ↓
              Block/Modify                   Block/Modify

Processing Flow

  1. Input arrives from user
  2. Input filtering checks against all configured policies
  3. If violation: Block with configured message OR Modify (redact PII)
  4. If clean: Forward to foundation model
  5. Model generates response
  6. Output filtering checks response against policies
  7. If violation: Block or Modify response
  8. Return safe response to user

Applying Guardrails

Guardrails can be attached to:

ResourceHow
Model invocationsSpecify guardrail ID in API call
AgentsAttach during agent creation
Knowledge BasesApply to RAG responses

API Example (Conceptual)

InvokeModel(
    modelId: "anthropic.claude-v2",
    guardrailIdentifier: "my-guardrail-id",
    guardrailVersion: "1",
    prompt: "..."
)

Pricing

Guardrails are priced per text unit analyzed:

Policy TypePricing Basis
Content filtersPer 1,000 text units
Denied topicsPer 1,000 text units
Word filtersPer 1,000 text units
PII detectionPer 1,000 text units
MultimodalPer image analyzed
Automated ReasoningPer text unit + policy evaluation

What’s a Text Unit?

  • Approximately 1,000 characters
  • Both input and output are counted separately
  • Longer prompts/responses = higher cost

Price Reduction (December 2024)

AWS announced 85% price reduction for:

  • Content filters
  • Denied topics

Cost Considerations

FactorImpact
Number of policies enabledMore policies = higher cost
Input + Output lengthLonger content = more text units
Call volumeHigh traffic apps incur significant costs
MultimodalImage evaluation adds per-image cost

Cost Tip: Enable only the policies you need. Content filters on HIGH sensitivity check more aggressively but cost the same.


Guardrails vs Model-Level Safety

AspectModel’s Built-in SafetyBedrock Guardrails
ControlProvider-definedYou define policies
CustomizationNoneFull customization
Denied topicsGenericYour specific topics
PII handlingLimitedConfigurable actions
AuditabilityLimitedCloudWatch logging
Cross-modelVaries by modelSame policies, any model

Best Practice: Use both — model safety as baseline, Guardrails for your specific requirements.


Monitoring & Troubleshooting

CloudWatch Integration

MetricWhat It Shows
GuardrailBlockedNumber of blocked requests
GuardrailIntervenedRequests modified (PII redacted)
LatencyProcessing time added by Guardrails

Debugging Blocked Requests

  • Enable detailed logging
  • Review which policy triggered
  • Adjust sensitivity or policy definition
  • Test with sample inputs

Best Practices

PracticeReason
Start with default sensitivitiesTune based on actual traffic
Test with realistic inputsEnsure legitimate requests aren’t blocked
Monitor blocked request rateHigh rate may indicate over-filtering
Use descriptive denied topicsClear definitions improve accuracy
Enable PII filtering for user-facing appsProtect both users and company
Log everythingAudit trail for compliance

TL;DR

  • Guardrails = configurable filters for inputs AND outputs
  • Policy types: Content filters, denied topics, word filters, PII, prompt attacks, multimodal, automated reasoning
  • Priced per 1,000 text units — 85% cheaper since December 2024
  • Attach to: Models, Agents, Knowledge Bases
  • Key benefit: Consistent safety policies across any model

Resources

Bedrock Guardrails
Official documentation for configuring guardrails.

Guardrails Pricing
Current pricing for guardrail policies.