@shinyaz

Strands Agents SDK Practical — Filter Agent I/O with Guardrails

Table of Contents

Introduction

In the previous article, we learned to control agent behavior with Hooks — limiting tool calls and modifying results. But there's another critical control: the safety of agent inputs and outputs themselves.

Adding a single guardrail_id automatically filters agent inputs and outputs.

In this article, we'll try:

  1. Bedrock Guardrails setup — Create a guardrail with AWS CLI
  2. Apply to Strands agent — Configure the guardrail on BedrockModel
  3. Guardrail intervention behavior — Verify stop_reason and automatic conversation history rewriting
  4. Shadow mode with Hooks — Monitor-only implementation without blocking

See the official documentation at Guardrails.

Setup

Use the same environment from Part 1. All examples use the same model configuration and can be run as independent .py files. Write the common setup at the top, then add each example's code below.

Python (common setup)
from strands import Agent
from strands.models import BedrockModel
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)

Creating Bedrock Guardrails

Create a guardrail with AWS CLI. This example blocks "Investment Advice" topics and filters violence/hate content.

Guardrail creation commands
Terminal
aws bedrock create-guardrail \
  --name "strands-test-guardrail" \
  --description "Test guardrail for Strands practical series" \
  --content-policy-config '{
    "filtersConfig": [
      {"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
      {"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"}
    ]
  }' \
  --topic-policy-config '{
    "topicsConfig": [
      {
        "name": "Investment Advice",
        "definition": "Providing specific investment recommendations or financial advice",
        "examples": ["What stocks should I buy?", "Should I invest in crypto?"],
        "type": "DENY"
      }
    ]
  }' \
  --blocked-input-messaging "Sorry, this request was blocked by guardrails." \
  --blocked-outputs-messaging "Sorry, this response was blocked by guardrails." \
  --region us-east-1

Note the guardrail ID from the output.

Output
{
    "guardrailId": "7by7u1yvthd8",
    "guardrailArn": "arn:aws:bedrock:us-east-1:123456789012:guardrail/7by7u1yvthd8",
    "version": "DRAFT"
}

Publish a version.

Terminal
aws bedrock create-guardrail-version \
  --guardrail-identifier "7by7u1yvthd8" \
  --region us-east-1

Gotcha: The parameter name is --blocked-outputs-messaging (outputs is plural). Using the singular --blocked-output-messaging causes an error.

Applying to a Strands Agent

Just add guardrail_id and guardrail_version to BedrockModel. This overrides the bedrock_model from the common setup with a guardrail-enabled version.

Python
GUARDRAIL_ID = "7by7u1yvthd8"  # Replace with your guardrail ID
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
    guardrail_id=GUARDRAIL_ID,
    guardrail_version="1",
    guardrail_trace="enabled",
)
 
agent = Agent(model=bedrock_model, callback_handler=None)

No changes to agent code. Just add the guardrail ID to the model configuration.

Guardrail Intervention Behavior

Compare a normal request with a blocked request.

Python
# Normal request
result1 = agent("What is the capital of France?")
print(f"Stop reason: {result1.stop_reason}")
print(f"Answer: {result1.message['content'][0]['text']}")
 
# Blocked request (investment advice)
agent2 = Agent(model=bedrock_model, callback_handler=None)
result2 = agent2("What stocks should I buy to get rich quickly?")
print(f"\nStop reason: {result2.stop_reason}")
print(f"Answer: {result2.message['content'][0]['text']}")
01_guardrails.py full code (copy-paste)
01_guardrails.py
from strands import Agent
from strands.models import BedrockModel
 
GUARDRAIL_ID = "YOUR_GUARDRAIL_ID"  # 自分のガードレール ID に置き換える
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
    guardrail_id=GUARDRAIL_ID,
    guardrail_version="1",
    guardrail_trace="enabled",
)
 
agent = Agent(model=bedrock_model, callback_handler=None)
 
result1 = agent("What is the capital of France?")
print(f"Stop reason: {result1.stop_reason}")
print(f"Answer: {result1.message['content'][0]['text']}")
 
agent2 = Agent(model=bedrock_model, callback_handler=None)
result2 = agent2("What stocks should I buy to get rich quickly?")
print(f"\nStop reason: {result2.stop_reason}")
print(f"Answer: {result2.message['content'][0]['text']}")
print(f"\nMessages after block: {len(agent2.messages)}")
for i, msg in enumerate(agent2.messages):
    role = msg['role']
    text = msg['content'][0].get('text', '')[:80]
    print(f"  [{i}] {role:10s}: {text}")
Terminal
python -u 01_guardrails.py

Result

Output
Stop reason: end_turn
Answer: The capital of France is Paris.
 
Stop reason: guardrail_intervened
Answer: Sorry, this request was blocked by guardrails.

Normal requests get stop_reason: end_turn with a regular answer. The investment advice request gets stop_reason: guardrail_intervened with the message configured in --blocked-input-messaging.

Automatic Conversation History Rewriting

Checking the conversation history after a block reveals an interesting behavior:

Output (conversation history)
Messages after block: 2
  [0] user      : [User input redacted.]
  [1] assistant : Sorry, this request was blocked by guardrails.

The user's input was automatically rewritten to [User input redacted.]. This prevents the same input from triggering the guardrail again in subsequent conversation turns. The original input is not preserved in conversation history.

Shadow Mode with Hooks — Monitor-Only Implementation

Before deploying guardrails to production, you may want to verify with a "shadow mode" that logs what would be blocked without actually blocking. We'll use Hooks from the previous article to implement this.

The key is not setting guardrails on BedrockModel, and instead calling the ApplyGuardrail API directly within Hooks.

Python (agent creation and execution)
import boto3
from strands import Agent
from strands.models import BedrockModel
from strands.hooks import MessageAddedEvent, AfterInvocationEvent
 
GUARDRAIL_ID = "7by7u1yvthd8"
GUARDRAIL_VERSION = "1"
 
# Model WITHOUT guardrails (shadow mode)
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
agent = Agent(model=bedrock_model, callback_handler=None)
agent.add_hook(check_user_input)
agent.add_hook(check_output)
 
result = agent("What stocks should I buy to get rich quickly?")
print(f"\nStop reason: {result.stop_reason}")
print(f"Answer: {result.message['content'][0]['text'][:100]}...")
Full hook function code (check_user_input, check_output)
Python
bedrock_client = boto3.client("bedrock-runtime", "us-east-1")
 
def check_user_input(event: MessageAddedEvent) -> None:
    if event.message.get("role") != "user":
        return
    content = "".join(block.get("text", "") for block in event.message.get("content", []))
    if not content:
        return
    try:
        response = bedrock_client.apply_guardrail(
            guardrailIdentifier=GUARDRAIL_ID,
            guardrailVersion=GUARDRAIL_VERSION,
            source="INPUT",
            content=[{"text": {"text": content}}],
        )
        if response.get("action") == "GUARDRAIL_INTERVENED":
            print(f"[SHADOW] WOULD BLOCK INPUT: {content[:60]}...")
            for assessment in response.get("assessments", []):
                if "topicPolicy" in assessment:
                    for topic in assessment["topicPolicy"].get("topics", []):
                        print(f"[SHADOW]   Topic: {topic['name']} -> {topic['action']}")
        else:
            print(f"[SHADOW] INPUT OK: {content[:60]}...")
    except Exception as e:
        print(f"[SHADOW] Error: {e}")
 
def check_output(event: AfterInvocationEvent) -> None:
    if not event.agent.messages or event.agent.messages[-1].get("role") != "assistant":
        return
    content = "".join(
        block.get("text", "") for block in event.agent.messages[-1].get("content", [])
    )
    if not content:
        return
    try:
        response = bedrock_client.apply_guardrail(
            guardrailIdentifier=GUARDRAIL_ID,
            guardrailVersion=GUARDRAIL_VERSION,
            source="OUTPUT",
            content=[{"text": {"text": content}}],
        )
        if response.get("action") == "GUARDRAIL_INTERVENED":
            print(f"[SHADOW] WOULD BLOCK OUTPUT: {content[:60]}...")
        else:
            print(f"[SHADOW] OUTPUT OK")
    except Exception as e:
        print(f"[SHADOW] Error: {e}")
Terminal
python -u 02_shadow.py

Result

Output
[SHADOW] WOULD BLOCK INPUT: What stocks should I buy to get rich quickly?...
[SHADOW]   Topic: Investment Advice -> BLOCKED
[SHADOW] WOULD BLOCK OUTPUT: I can't recommend specific stocks for getting rich quickly, ...
 
Stop reason: end_turn
Answer: I can't recommend specific stocks for getting rich quickly, and here's why that approach is risky:...

Both input and output are detected as "WOULD BLOCK," but nothing is actually blocked. The agent generates its response normally.

Key points of this approach:

  • No guardrails on BedrockModel — No filtering at the model level
  • Separate check via apply_guardrail API — Call Bedrock's ApplyGuardrail API directly within Hooks
  • Log only — Output the block determination to logs without changing agent behavior

Useful during a tuning period before production deployment to understand what inputs and outputs would be blocked.

Summary

  • A single guardrail_id addition auto-filters inputs and outputs — Set the guardrail ID and version on BedrockModel. No agent code changes needed.
  • Blocked requests get stop_reason: guardrail_intervened — Programmatically detect blocks. User input is auto-rewritten to [User input redacted.] to prevent impact on subsequent conversation.
  • Hooks + ApplyGuardrail API enables shadow mode — Monitor-only mode without blocking, useful for pre-production tuning. A practical application of Hooks from the previous article.
  • Watch for --blocked-outputs-messaging (plural) in AWS CLI — The singular --blocked-output-messaging causes an error.

Cleanup

Delete the guardrail after verification.

Terminal
aws bedrock delete-guardrail \
  --guardrail-identifier "7by7u1yvthd8" \
  --region us-east-1

Share this post

Shinya Tahara

Shinya Tahara

Solutions Architect @ AWS

I'm a Solutions Architect at AWS, providing technical guidance primarily to financial industry customers. I share learnings about cloud architecture and AI/ML on this site.The views and opinions expressed on this site are my own and do not represent the official positions of my employer.

Related Posts