Strands Agents SDK Practical — Optimize Agents with Metrics

Introduction

In Part 1 of the intro series, we first saw result.metrics.get_summary(). We noted that cycles were 2 and token usage was 4,554. But at that point, we just "looked at the numbers."

With result.metrics, you can compare cycle counts, token usage, and tool execution times to identify bottlenecks.

In this article, we'll try:

Metrics overview — Walk through all fields in get_summary()
Per-tool performance analysis — Identify which tool is the bottleneck
Cycle count optimization — Compare how tool design changes affect cycle counts

See the official documentation at Metrics.

Setup

Use the same environment from Part 1. All examples use the same model configuration and can be run as independent .py files. Write the common setup at the top, then add each example's code below.

Python (common setup)

from strands import Agent, tool
from strands.models import BedrockModel
import json
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)

Metrics Overview

Here are the key fields returned by result.metrics.get_summary():

Python

from strands_tools import calculator
 
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
 
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))

01_overview.py full code (copy-paste)

01_overview.py

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator
import json
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
 
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))

Terminal

python -u 01_overview.py

Key fields:

Field	Description
`total_cycles`	Total agent loop cycles
`total_duration`	Total execution time (seconds)
`accumulated_usage.inputTokens`	Total input tokens
`accumulated_usage.outputTokens`	Total output tokens
`accumulated_usage.totalTokens`	Total tokens
`tool_usage.<name>.execution_stats.call_count`	Tool invocation count
`tool_usage.<name>.execution_stats.success_count`	Success count
`tool_usage.<name>.execution_stats.error_count`	Error count
`tool_usage.<name>.execution_stats.total_time`	Total tool execution time (seconds)
`tool_usage.<name>.execution_stats.average_time`	Average tool execution time (seconds)
`tool_usage.<name>.execution_stats.success_rate`	Success rate (0.0–1.0)

Beyond the "cycle count" and "token usage" we saw in the intro series, you get per-tool execution times and success rates.

Per-Tool Performance Analysis

Create two tools with different execution times and identify the bottleneck via metrics.

Python (execution)

agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
 
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
    stats = info['execution_stats']
    print(f"{tool_name}:")
    print(f"  calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")

02_tool_perf.py full code (copy-paste)

02_tool_perf.py

from strands import Agent, tool
from strands.models import BedrockModel
import time
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
@tool
def fast_lookup(key: str) -> str:
    """Look up a value by key (fast operation).
 
    Args:
        key: The key to look up
 
    Returns:
        str: The value
    """
    data = {"name": "Taro", "city": "Tokyo", "lang": "Python"}
    return data.get(key, "Not found")
 
@tool
def slow_analysis(text: str) -> str:
    """Analyze text (slow operation that simulates processing).
 
    Args:
        text: The text to analyze
 
    Returns:
        str: Analysis result
    """
    time.sleep(0.5)
    return f"Analysis of '{text}': {len(text)} chars, {len(text.split())} words"
 
agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
 
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
    stats = info['execution_stats']
    print(f"{tool_name}:")
    print(f"  calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")

Terminal

python -u 02_tool_perf.py

Result

Output

fast_lookup:
  calls=2, total_time=0.0032s, avg_time=0.0016s
slow_analysis:
  calls=1, total_time=0.5012s, avg_time=0.5012s

fast_lookup was called twice totaling 0.003s, while slow_analysis took 0.5s in a single call. The bottleneck is clearly slow_analysis.

In production, this information enables decisions like:

Introduce caching — Cache slow_analysis results to speed up subsequent calls
Go async — Run heavy processing in the background
Split the tool — Separate heavy tools into light and heavy parts, increasing cases where the light part alone suffices

Cycle Count Optimization — Impact of Tool Design

In Part 2 of the intro series, we learned the principle of "separating tool responsibilities." But separation has trade-offs. Let's verify with metrics.

Pattern A: Separate Tools (Sequential Calls)

Python (Pattern A: execution)

agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()

Pattern B: Combined Tool (Single Call)

Python

@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
    """Convert currency amount from base to target currency.
 
    Args:
        amount: The amount to convert
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Conversion result with rate
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
 
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()

03_compare.py full code (copy-paste)

03_compare.py

from strands import Agent, tool
from strands.models import BedrockModel
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
@tool
def get_exchange_rate(base: str, target: str) -> dict:
    """Get the current exchange rate between two currencies.
 
    Args:
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Exchange rate information
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base.upper(), "target": target.upper(), "rate": rate}
 
@tool
def convert_currency(amount: float, rate: float) -> dict:
    """Convert an amount using a given exchange rate.
 
    Args:
        amount: The amount to convert
        rate: The exchange rate to apply
 
    Returns:
        dict: Conversion result
    """
    return {"original": amount, "rate": rate, "converted": round(amount * rate, 2)}
 
agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()
 
@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
    """Convert currency amount from base to target currency.
 
    Args:
        amount: The amount to convert
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Conversion result with rate
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
 
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()
 
print("=== Comparison ===")
print(f"Cycles: {summary_a['total_cycles']} vs {summary_b['total_cycles']}")
print(f"Tokens: {summary_a['accumulated_usage']['totalTokens']} vs {summary_b['accumulated_usage']['totalTokens']}")
print(f"Duration: {summary_a['total_duration']:.2f}s vs {summary_b['total_duration']:.2f}s")

Terminal

python -u 03_compare.py

Comparison

Output

=== Comparison ===
Cycles: 3 vs 2
Tokens: 2270 vs 1243
Duration: 7.31s vs 4.33s

Metric	Pattern A (Separate)	Pattern B (Combined)
Cycles	3	2
Tokens	2,270	1,243
Duration	7.31s	4.33s

Separate tools take 3 cycles (get rate → convert → respond), while the combined tool completes in 2 cycles (convert → respond). Tokens reduced by ~45%, execution time by ~40%.

This is a trade-off with the "separate responsibilities" principle from Part 2 of the intro series:

Separation wins when — Rate lookup and conversion are used independently. Testing is easier.
Combination wins when — Tools are always used together. Performance matters.

Verify with actual metrics and decide based on your use case.

Series Recap

Across all 5 articles in the practical series, we learned how to turn intro-level agents into "reliable applications."

Part	Topic	What We Learned
Part 1	Structured Output	Type-safe output with Pydantic, automatic validation retries
Part 2	Session Management	Conversation persistence and restoration with FileSessionManager
Part 3	Hooks	Tool call monitoring, limiting, and result modification
Part 4	Guardrails	Input/output filtering and shadow mode
Part 5 (this article)	Metrics	Per-tool performance analysis, cycle count optimization

While the intro series was about "understanding agent fundamentals," the practical series was about "making output predictable, persisting conversations, controlling behavior, ensuring safety, and optimizing performance."

From here, you can explore advanced multi-agent patterns like Swarm and Graph, deployment to AWS Lambda or Bedrock AgentCore, and observability with OpenTelemetry.

Summary

result.metrics gives you cycle counts, token usage, and tool execution times at a glance — get_summary() lets you understand agent behavior numerically. It's the starting point for production debugging and optimization.
Per-tool execution times identify bottlenecks — tool_usage.<name>.execution_stats shows call counts, execution times, and success rates for each tool, helping you decide what to optimize.
Tool design directly impacts cycle counts and token usage — Separate tools (3 cycles, 2,270 tokens) vs combined tool (2 cycles, 1,243 tokens). Measure with metrics and decide based on your use case.
Intro series knowledge carries into the practical series — Understanding of agent loops (Part 1), tool design (Part 2), conversation management (Part 4), and multi-agent (Part 5) forms the foundation for every practical series topic.

Strands Agents SDK Practical — Optimize Agents with Metrics

Introduction

Setup

Metrics Overview

Per-Tool Performance Analysis

Result

Cycle Count Optimization — Impact of Tool Design

Pattern A: Separate Tools (Sequential Calls)

Pattern B: Combined Tool (Single Call)

Comparison

Series Recap

Summary

Related Posts

Strands Agents SDK Deploy — Visualize Agent Traces with OpenTelemetry

Strands Agents SDK Deploy — Managed Deployment with AgentCore CLI

Strands Agents SDK Deploy — Serverless Deployment to AWS Lambda