Strands Agents SDK Practical — Optimize Agents with Metrics
Table of Contents
Introduction
In Part 1 of the intro series, we first saw result.metrics.get_summary(). We noted that cycles were 2 and token usage was 4,554. But at that point, we just "looked at the numbers."
With result.metrics, you can compare cycle counts, token usage, and tool execution times to identify bottlenecks.
In this article, we'll try:
- Metrics overview — Walk through all fields in
get_summary() - Per-tool performance analysis — Identify which tool is the bottleneck
- Cycle count optimization — Compare how tool design changes affect cycle counts
See the official documentation at Metrics.
Setup
Use the same environment from Part 1. All examples use the same model configuration and can be run as independent .py files. Write the common setup at the top, then add each example's code below.
from strands import Agent, tool
from strands.models import BedrockModel
import json
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)Metrics Overview
Here are the key fields returned by result.metrics.get_summary():
from strands_tools import calculator
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))01_overview.py full code (copy-paste)
from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator
import json
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))python -u 01_overview.pyKey fields:
| Field | Description |
|---|---|
total_cycles | Total agent loop cycles |
total_duration | Total execution time (seconds) |
accumulated_usage.inputTokens | Total input tokens |
accumulated_usage.outputTokens | Total output tokens |
accumulated_usage.totalTokens | Total tokens |
tool_usage.<name>.execution_stats.call_count | Tool invocation count |
tool_usage.<name>.execution_stats.success_count | Success count |
tool_usage.<name>.execution_stats.error_count | Error count |
tool_usage.<name>.execution_stats.total_time | Total tool execution time (seconds) |
tool_usage.<name>.execution_stats.average_time | Average tool execution time (seconds) |
tool_usage.<name>.execution_stats.success_rate | Success rate (0.0–1.0) |
Beyond the "cycle count" and "token usage" we saw in the intro series, you get per-tool execution times and success rates.
Per-Tool Performance Analysis
Create two tools with different execution times and identify the bottleneck via metrics.
agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
stats = info['execution_stats']
print(f"{tool_name}:")
print(f" calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")02_tool_perf.py full code (copy-paste)
from strands import Agent, tool
from strands.models import BedrockModel
import time
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)
@tool
def fast_lookup(key: str) -> str:
"""Look up a value by key (fast operation).
Args:
key: The key to look up
Returns:
str: The value
"""
data = {"name": "Taro", "city": "Tokyo", "lang": "Python"}
return data.get(key, "Not found")
@tool
def slow_analysis(text: str) -> str:
"""Analyze text (slow operation that simulates processing).
Args:
text: The text to analyze
Returns:
str: Analysis result
"""
time.sleep(0.5)
return f"Analysis of '{text}': {len(text)} chars, {len(text.split())} words"
agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
stats = info['execution_stats']
print(f"{tool_name}:")
print(f" calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")python -u 02_tool_perf.pyResult
fast_lookup:
calls=2, total_time=0.0032s, avg_time=0.0016s
slow_analysis:
calls=1, total_time=0.5012s, avg_time=0.5012sfast_lookup was called twice totaling 0.003s, while slow_analysis took 0.5s in a single call. The bottleneck is clearly slow_analysis.
In production, this information enables decisions like:
- Introduce caching — Cache
slow_analysisresults to speed up subsequent calls - Go async — Run heavy processing in the background
- Split the tool — Separate heavy tools into light and heavy parts, increasing cases where the light part alone suffices
Cycle Count Optimization — Impact of Tool Design
In Part 2 of the intro series, we learned the principle of "separating tool responsibilities." But separation has trade-offs. Let's verify with metrics.
Pattern A: Separate Tools (Sequential Calls)
agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()Pattern B: Combined Tool (Single Call)
@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
"""Convert currency amount from base to target currency.
Args:
amount: The amount to convert
base: The base currency code
target: The target currency code
Returns:
dict: Conversion result with rate
"""
rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
rate = rates.get((base.upper(), target.upper()))
if rate is None:
return {"error": f"Rate not found for {base}/{target}"}
return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()03_compare.py full code (copy-paste)
from strands import Agent, tool
from strands.models import BedrockModel
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)
@tool
def get_exchange_rate(base: str, target: str) -> dict:
"""Get the current exchange rate between two currencies.
Args:
base: The base currency code
target: The target currency code
Returns:
dict: Exchange rate information
"""
rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
rate = rates.get((base.upper(), target.upper()))
if rate is None:
return {"error": f"Rate not found for {base}/{target}"}
return {"base": base.upper(), "target": target.upper(), "rate": rate}
@tool
def convert_currency(amount: float, rate: float) -> dict:
"""Convert an amount using a given exchange rate.
Args:
amount: The amount to convert
rate: The exchange rate to apply
Returns:
dict: Conversion result
"""
return {"original": amount, "rate": rate, "converted": round(amount * rate, 2)}
agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()
@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
"""Convert currency amount from base to target currency.
Args:
amount: The amount to convert
base: The base currency code
target: The target currency code
Returns:
dict: Conversion result with rate
"""
rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
rate = rates.get((base.upper(), target.upper()))
if rate is None:
return {"error": f"Rate not found for {base}/{target}"}
return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()
print("=== Comparison ===")
print(f"Cycles: {summary_a['total_cycles']} vs {summary_b['total_cycles']}")
print(f"Tokens: {summary_a['accumulated_usage']['totalTokens']} vs {summary_b['accumulated_usage']['totalTokens']}")
print(f"Duration: {summary_a['total_duration']:.2f}s vs {summary_b['total_duration']:.2f}s")python -u 03_compare.pyComparison
=== Comparison ===
Cycles: 3 vs 2
Tokens: 2270 vs 1243
Duration: 7.31s vs 4.33s| Metric | Pattern A (Separate) | Pattern B (Combined) |
|---|---|---|
| Cycles | 3 | 2 |
| Tokens | 2,270 | 1,243 |
| Duration | 7.31s | 4.33s |
Separate tools take 3 cycles (get rate → convert → respond), while the combined tool completes in 2 cycles (convert → respond). Tokens reduced by ~45%, execution time by ~40%.
This is a trade-off with the "separate responsibilities" principle from Part 2 of the intro series:
- Separation wins when — Rate lookup and conversion are used independently. Testing is easier.
- Combination wins when — Tools are always used together. Performance matters.
Verify with actual metrics and decide based on your use case.
Series Recap
Across all 5 articles in the practical series, we learned how to turn intro-level agents into "reliable applications."
| Part | Topic | What We Learned |
|---|---|---|
| Part 1 | Structured Output | Type-safe output with Pydantic, automatic validation retries |
| Part 2 | Session Management | Conversation persistence and restoration with FileSessionManager |
| Part 3 | Hooks | Tool call monitoring, limiting, and result modification |
| Part 4 | Guardrails | Input/output filtering and shadow mode |
| Part 5 (this article) | Metrics | Per-tool performance analysis, cycle count optimization |
While the intro series was about "understanding agent fundamentals," the practical series was about "making output predictable, persisting conversations, controlling behavior, ensuring safety, and optimizing performance."
From here, you can explore advanced multi-agent patterns like Swarm and Graph, deployment to AWS Lambda or Bedrock AgentCore, and observability with OpenTelemetry.
Summary
result.metricsgives you cycle counts, token usage, and tool execution times at a glance —get_summary()lets you understand agent behavior numerically. It's the starting point for production debugging and optimization.- Per-tool execution times identify bottlenecks —
tool_usage.<name>.execution_statsshows call counts, execution times, and success rates for each tool, helping you decide what to optimize. - Tool design directly impacts cycle counts and token usage — Separate tools (3 cycles, 2,270 tokens) vs combined tool (2 cycles, 1,243 tokens). Measure with metrics and decide based on your use case.
- Intro series knowledge carries into the practical series — Understanding of agent loops (Part 1), tool design (Part 2), conversation management (Part 4), and multi-agent (Part 5) forms the foundation for every practical series topic.
