@shinyaz

Strands Agents SDK Practical — Optimize Agents with Metrics

Table of Contents

Introduction

In Part 1 of the intro series, we first saw result.metrics.get_summary(). We noted that cycles were 2 and token usage was 4,554. But at that point, we just "looked at the numbers."

With result.metrics, you can compare cycle counts, token usage, and tool execution times to identify bottlenecks.

In this article, we'll try:

  1. Metrics overview — Walk through all fields in get_summary()
  2. Per-tool performance analysis — Identify which tool is the bottleneck
  3. Cycle count optimization — Compare how tool design changes affect cycle counts

See the official documentation at Metrics.

Setup

Use the same environment from Part 1. All examples use the same model configuration and can be run as independent .py files. Write the common setup at the top, then add each example's code below.

Python (common setup)
from strands import Agent, tool
from strands.models import BedrockModel
import json
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)

Metrics Overview

Here are the key fields returned by result.metrics.get_summary():

Python
from strands_tools import calculator
 
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
 
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))
01_overview.py full code (copy-paste)
01_overview.py
from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator
import json
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
agent = Agent(model=bedrock_model, tools=[calculator], callback_handler=None)
result = agent("What is 123 * 456?")
 
summary = result.metrics.get_summary()
print(json.dumps(summary, indent=2, default=str))
Terminal
python -u 01_overview.py

Key fields:

FieldDescription
total_cyclesTotal agent loop cycles
total_durationTotal execution time (seconds)
accumulated_usage.inputTokensTotal input tokens
accumulated_usage.outputTokensTotal output tokens
accumulated_usage.totalTokensTotal tokens
tool_usage.<name>.execution_stats.call_countTool invocation count
tool_usage.<name>.execution_stats.success_countSuccess count
tool_usage.<name>.execution_stats.error_countError count
tool_usage.<name>.execution_stats.total_timeTotal tool execution time (seconds)
tool_usage.<name>.execution_stats.average_timeAverage tool execution time (seconds)
tool_usage.<name>.execution_stats.success_rateSuccess rate (0.0–1.0)

Beyond the "cycle count" and "token usage" we saw in the intro series, you get per-tool execution times and success rates.

Per-Tool Performance Analysis

Create two tools with different execution times and identify the bottleneck via metrics.

Python (execution)
agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
 
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
    stats = info['execution_stats']
    print(f"{tool_name}:")
    print(f"  calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")
02_tool_perf.py full code (copy-paste)
02_tool_perf.py
from strands import Agent, tool
from strands.models import BedrockModel
import time
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
@tool
def fast_lookup(key: str) -> str:
    """Look up a value by key (fast operation).
 
    Args:
        key: The key to look up
 
    Returns:
        str: The value
    """
    data = {"name": "Taro", "city": "Tokyo", "lang": "Python"}
    return data.get(key, "Not found")
 
@tool
def slow_analysis(text: str) -> str:
    """Analyze text (slow operation that simulates processing).
 
    Args:
        text: The text to analyze
 
    Returns:
        str: Analysis result
    """
    time.sleep(0.5)
    return f"Analysis of '{text}': {len(text)} chars, {len(text.split())} words"
 
agent = Agent(model=bedrock_model, tools=[fast_lookup, slow_analysis], callback_handler=None)
result = agent("Look up the name and city, then analyze the sentence 'Hello World from Tokyo'")
 
summary = result.metrics.get_summary()
for tool_name, info in summary['tool_usage'].items():
    stats = info['execution_stats']
    print(f"{tool_name}:")
    print(f"  calls={stats['call_count']}, total_time={stats['total_time']:.4f}s, avg_time={stats['average_time']:.4f}s")
Terminal
python -u 02_tool_perf.py

Result

Output
fast_lookup:
  calls=2, total_time=0.0032s, avg_time=0.0016s
slow_analysis:
  calls=1, total_time=0.5012s, avg_time=0.5012s

fast_lookup was called twice totaling 0.003s, while slow_analysis took 0.5s in a single call. The bottleneck is clearly slow_analysis.

In production, this information enables decisions like:

  • Introduce caching — Cache slow_analysis results to speed up subsequent calls
  • Go async — Run heavy processing in the background
  • Split the tool — Separate heavy tools into light and heavy parts, increasing cases where the light part alone suffices

Cycle Count Optimization — Impact of Tool Design

In Part 2 of the intro series, we learned the principle of "separating tool responsibilities." But separation has trade-offs. Let's verify with metrics.

Pattern A: Separate Tools (Sequential Calls)

Python (Pattern A: execution)
agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()

Pattern B: Combined Tool (Single Call)

Python
@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
    """Convert currency amount from base to target currency.
 
    Args:
        amount: The amount to convert
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Conversion result with rate
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
 
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()
03_compare.py full code (copy-paste)
03_compare.py
from strands import Agent, tool
from strands.models import BedrockModel
 
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
)
 
@tool
def get_exchange_rate(base: str, target: str) -> dict:
    """Get the current exchange rate between two currencies.
 
    Args:
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Exchange rate information
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base.upper(), "target": target.upper(), "rate": rate}
 
@tool
def convert_currency(amount: float, rate: float) -> dict:
    """Convert an amount using a given exchange rate.
 
    Args:
        amount: The amount to convert
        rate: The exchange rate to apply
 
    Returns:
        dict: Conversion result
    """
    return {"original": amount, "rate": rate, "converted": round(amount * rate, 2)}
 
agent_a = Agent(model=bedrock_model, tools=[get_exchange_rate, convert_currency], callback_handler=None)
result_a = agent_a("Convert 1000 EUR to JPY")
summary_a = result_a.metrics.get_summary()
 
@tool
def convert_with_rate(amount: float, base: str, target: str) -> dict:
    """Convert currency amount from base to target currency.
 
    Args:
        amount: The amount to convert
        base: The base currency code
        target: The target currency code
 
    Returns:
        dict: Conversion result with rate
    """
    rates = {("USD", "JPY"): 149.50, ("EUR", "USD"): 1.08, ("EUR", "JPY"): 161.46}
    rate = rates.get((base.upper(), target.upper()))
    if rate is None:
        return {"error": f"Rate not found for {base}/{target}"}
    return {"base": base, "target": target, "rate": rate, "original": amount, "converted": round(amount * rate, 2)}
 
agent_b = Agent(model=bedrock_model, tools=[convert_with_rate], callback_handler=None)
result_b = agent_b("Convert 1000 EUR to JPY")
summary_b = result_b.metrics.get_summary()
 
print("=== Comparison ===")
print(f"Cycles: {summary_a['total_cycles']} vs {summary_b['total_cycles']}")
print(f"Tokens: {summary_a['accumulated_usage']['totalTokens']} vs {summary_b['accumulated_usage']['totalTokens']}")
print(f"Duration: {summary_a['total_duration']:.2f}s vs {summary_b['total_duration']:.2f}s")
Terminal
python -u 03_compare.py

Comparison

Output
=== Comparison ===
Cycles: 3 vs 2
Tokens: 2270 vs 1243
Duration: 7.31s vs 4.33s
MetricPattern A (Separate)Pattern B (Combined)
Cycles32
Tokens2,2701,243
Duration7.31s4.33s

Separate tools take 3 cycles (get rate → convert → respond), while the combined tool completes in 2 cycles (convert → respond). Tokens reduced by ~45%, execution time by ~40%.

This is a trade-off with the "separate responsibilities" principle from Part 2 of the intro series:

  • Separation wins when — Rate lookup and conversion are used independently. Testing is easier.
  • Combination wins when — Tools are always used together. Performance matters.

Verify with actual metrics and decide based on your use case.

Series Recap

Across all 5 articles in the practical series, we learned how to turn intro-level agents into "reliable applications."

PartTopicWhat We Learned
Part 1Structured OutputType-safe output with Pydantic, automatic validation retries
Part 2Session ManagementConversation persistence and restoration with FileSessionManager
Part 3HooksTool call monitoring, limiting, and result modification
Part 4GuardrailsInput/output filtering and shadow mode
Part 5 (this article)MetricsPer-tool performance analysis, cycle count optimization

While the intro series was about "understanding agent fundamentals," the practical series was about "making output predictable, persisting conversations, controlling behavior, ensuring safety, and optimizing performance."

From here, you can explore advanced multi-agent patterns like Swarm and Graph, deployment to AWS Lambda or Bedrock AgentCore, and observability with OpenTelemetry.

Summary

  • result.metrics gives you cycle counts, token usage, and tool execution times at a glanceget_summary() lets you understand agent behavior numerically. It's the starting point for production debugging and optimization.
  • Per-tool execution times identify bottleneckstool_usage.<name>.execution_stats shows call counts, execution times, and success rates for each tool, helping you decide what to optimize.
  • Tool design directly impacts cycle counts and token usage — Separate tools (3 cycles, 2,270 tokens) vs combined tool (2 cycles, 1,243 tokens). Measure with metrics and decide based on your use case.
  • Intro series knowledge carries into the practical series — Understanding of agent loops (Part 1), tool design (Part 2), conversation management (Part 4), and multi-agent (Part 5) forms the foundation for every practical series topic.

Share this post

Shinya Tahara

Shinya Tahara

Solutions Architect @ AWS

I'm a Solutions Architect at AWS, providing technical guidance primarily to financial industry customers. I share learnings about cloud architecture and AI/ML on this site.The views and opinions expressed on this site are my own and do not represent the official positions of my employer.

Related Posts