Execute Shell Commands in Bedrock AgentCore Runtime Sessions with InvokeAgentRuntimeCommand

Introduction

On March 17, 2026, AWS added the InvokeAgentRuntimeCommand API to Amazon Bedrock AgentCore Runtime. Until now, running shell commands inside an agent session required custom process management logic baked into your container. The new API solves this at the platform level, letting you run deterministic operations like test execution, git commands, and dependency installation separately from LLM reasoning.

This post shares the results of testing the API across seven dimensions using a minimal Python runtime deployed with code configuration (S3 ZIP). See the official documentation for the full API reference.

Why Separate Shell Commands from Agent Reasoning

A typical AI coding agent workflow alternates between LLM-driven code generation and deterministic operations like tests, builds, and git. Previously, InvokeAgentRuntime had to handle all of this, creating three problems: mixed concerns in the reasoning loop, long builds blocking the entire agent, and no built-in streaming for command output.

InvokeAgentRuntimeCommand solves all three at once.

API Design

The API accepts a command string and timeout, returning an HTTP/2 EventStream response with three event types:

Parameter	Type	Description
`agentRuntimeArn`	string	Deployed runtime ARN
`runtimeSessionId`	string	Session ID (minimum 33 characters)
`body.command`	string	Shell command to execute (1B–64KB)
`body.timeout`	integer	Timeout in seconds (1–3600)

Response events: contentStart (execution confirmed), contentDelta (streaming stdout/stderr), and contentStop (exitCode + status: COMPLETED or TIMED_OUT).

Each command runs as a one-shot bash process. The docs say "Stateless between commands" with no shell history or environment variable persistence, but also state that "Commands execute within the same container, filesystem." As verified below, the filesystem is indeed shared within a session, so you only need && chaining when shell state carryover is required.

Setting Up the Test Environment

I deployed a minimal Python agent using code configuration (S3 ZIP). Here are the full steps to reproduce.

Prerequisites:

AWS CLI configured with bedrock-agentcore:*, iam:*, and s3:* permissions
boto3 1.42.70 (version used in this verification; supports invoke_agent_runtime_command)
bedrock-agentcore:InvokeAgentRuntimeCommand permission on the calling IAM principal

Agent Code

The agent code itself is minimal — InvokeAgentRuntimeCommand operates independently of the agent logic.

main.py

import json
import sys
 
def handle_invoke(event):
    user_input = event.get("input", {}).get("text", "")
    return {
        "output": {
            "text": f"Received: {user_input}. This is a minimal test agent."
        }
    }
 
def main():
    for line in sys.stdin:
        line = line.strip()
        if not line:
            continue
        try:
            request = json.loads(line)
            response = handle_invoke(request)
            print(json.dumps(response), flush=True)
        except json.JSONDecodeError:
            print(json.dumps({"error": "Invalid JSON"}), flush=True)
 
if __name__ == "__main__":
    main()

Deployment

IAM role creation, S3 upload, and runtime + endpoint creation in a single sequence. If you just want the results, skip ahead to Verification Results.

Terminal

# Variables
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET_NAME="agentcore-test-shell-cmd-${ACCOUNT_ID}"
REGION="us-west-2"
 
# S3 bucket and code upload
zip agent.zip main.py
aws s3 mb "s3://${BUCKET_NAME}" --region "$REGION"
aws s3 cp agent.zip "s3://${BUCKET_NAME}/agent.zip"
 
# IAM role with trust policy for AgentCore
aws iam create-role \
  --role-name AgentCoreRuntimeTestRole \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }]
  }'
 
# Attach required policies
aws iam attach-role-policy \
  --role-name AgentCoreRuntimeTestRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess
 
aws iam put-role-policy \
  --role-name AgentCoreRuntimeTestRole \
  --policy-name S3Access \
  --policy-document "{
    \"Version\": \"2012-10-17\",
    \"Statement\": [{
      \"Effect\": \"Allow\",
      \"Action\": [\"s3:GetObject\", \"s3:ListBucket\"],
      \"Resource\": [
        \"arn:aws:s3:::${BUCKET_NAME}\",
        \"arn:aws:s3:::${BUCKET_NAME}/*\"
      ]
    }]
  }"
 
# Create runtime
aws bedrock-agentcore-control create-agent-runtime \
  --region "$REGION" \
  --agent-runtime-name shell_cmd_test_agent \
  --role-arn "arn:aws:iam::${ACCOUNT_ID}:role/AgentCoreRuntimeTestRole" \
  --agent-runtime-artifact "{
    \"codeConfiguration\": {
      \"code\": {\"s3\": {\"bucket\": \"${BUCKET_NAME}\", \"prefix\": \"agent.zip\"}},
      \"runtime\": \"PYTHON_3_13\",
      \"entryPoint\": [\"main.py\"]
    }
  }" \
  --network-configuration '{"networkMode": "PUBLIC"}'
# → Note the agentRuntimeId from the response
 
RUNTIME_ID="shell_cmd_test_agent-XXXXXXXXXX"  # Replace with actual ID
 
# Create endpoint and poll until READY
aws bedrock-agentcore-control create-agent-runtime-endpoint \
  --region "$REGION" \
  --agent-runtime-id "$RUNTIME_ID" \
  --name shell_cmd_test_endpoint
 
while true; do
  STATUS=$(aws bedrock-agentcore-control get-agent-runtime-endpoint \
    --region "$REGION" \
    --agent-runtime-id "$RUNTIME_ID" \
    --endpoint-name shell_cmd_test_endpoint \
    --query 'status' --output text)
  echo "Endpoint status: $STATUS"
  [ "$STATUS" = "READY" ] && break
  sleep 10
done

The runtime and endpoint reached READY status within the first polling interval (under 10 seconds). Code configuration does not require a container image build.

Verification Results

I tested the API across seven dimensions:

Basic execution — Three-phase EventStream events
Container environment — Pre-installed tools
In-session file persistence — File visibility across API calls
Cross-session isolation — File invisibility with different session IDs
Error handling — Non-zero exit code behavior
Timeout — TIMED_OUT status and partial output
Concurrent execution — Non-blocking parallel commands

All tests below use this Python base. Session IDs must be at least 33 characters, so I concatenate two UUIDs.

Python

import boto3, sys, uuid
 
client = boto3.client('bedrock-agentcore', region_name='us-west-2')
RUNTIME_ARN = "arn:aws:bedrock-agentcore:us-west-2:ACCOUNT_ID:runtime/RUNTIME_ID"
SESSION_ID = str(uuid.uuid4()) + "-" + str(uuid.uuid4())[:8]  # 45 chars
 
def run_command(command, timeout=30, session_id=None):
    """Execute a command and process the EventStream."""
    response = client.invoke_agent_runtime_command(
        agentRuntimeArn=RUNTIME_ARN,
        runtimeSessionId=session_id or SESSION_ID,
        qualifier='DEFAULT',
        contentType='application/json',
        accept='application/vnd.amazon.eventstream',
        body={'command': command, 'timeout': timeout}
    )
    for event in response.get('stream', []):
        if 'chunk' in event:
            chunk = event['chunk']
            if 'contentStart' in chunk:
                print("[contentStart] Command execution started")
            if 'contentDelta' in chunk:
                delta = chunk['contentDelta']
                if delta.get('stdout'):
                    print(f"[stdout] {delta['stdout']}", end='')
                if delta.get('stderr'):
                    print(f"[stderr] {delta['stderr']}", end='', file=sys.stderr)
            if 'contentStop' in chunk:
                stop = chunk['contentStop']
                print(f"[contentStop] Exit code: {stop.get('exitCode')}, "
                      f"Status: {stop.get('status')}")

1. Basic Execution and Streaming

Python

run_command('/bin/bash -c "echo Hello from AgentCore Runtime"')

Output

[contentStart] Command execution started
[stdout] Hello from AgentCore Runtime
[contentStop] Exit code: 0, Status: COMPLETED

The three-phase event model works exactly as documented, with real-time streaming for long-running commands.

2. Container Environment

I ran uname -a, python3 --version, whoami, and --version checks for common tools. The container runs Amazon Linux 2023 (6.1.158-15.288.amzn2023.aarch64) as root:

Tool	Available?
python3	3.13.9 (matches runtime config)
git	Not installed
curl	Not installed
node	Not installed
pip	Not installed
aws CLI	Not installed

Code configuration only includes the Python runtime — git, curl, node, pip, and AWS CLI are not pre-installed. Use container configuration with a custom Dockerfile for production agents that need additional tools.

3. In-Session File Persistence

I tested whether files created in one API call are visible to subsequent calls in the same session:

Python

# API call 1: Create file
run_command('/bin/bash -c "mkdir -p /tmp/test_dir && echo \'file content\' > /tmp/test_dir/test.txt"')
 
# API call 2 (same session): Read file
run_command('/bin/bash -c "cat /tmp/test_dir/test.txt"')
# → [stdout] file content

The docs say "Stateless between commands" but also "Commands execute within the same container, filesystem." In this test, the filesystem was shared within the session — the /tmp/ file persisted across separate API calls. You can pass intermediate artifacts through files without && chaining.

4. Cross-Session microVM Isolation

A different session ID cannot access files from another session:

Python

new_session = str(uuid.uuid4()) + "-" + str(uuid.uuid4())[:8]
run_command(
    '/bin/bash -c "cat /tmp/test_dir/test.txt 2>&1 || echo File NOT found"',
    session_id=new_session
)
# → [stdout] File NOT found

Each session gets its own microVM with separate kernel, memory, and filesystem.

5. Error Handling

Failed commands return non-zero exit codes through contentStop, not as API errors:

Output

[stderr] ls: cannot access '/nonexistent_path': No such file or directory
[contentStop] Exit code: 2, Status: COMPLETED

This makes it natural to handle test failures and build errors in normal application flow.

6. Timeout Behavior

A 5-second sleep with a 1-second timeout:

Python

run_command('/bin/bash -c "echo start && sleep 5 && echo end"', timeout=1)

Output

[stdout] start
[contentStop] Exit code: -1, Status: TIMED_OUT

Output produced before the timeout is still streamed. Timeouts are clearly distinguished: exitCode: -1 with status: TIMED_OUT. The end output is never produced.

7. Concurrent Execution

Three 2-second commands running concurrently in the same session via concurrent.futures:

Output

cmd_A: exit=0, time=3.51s, output=A_done
cmd_B: exit=0, time=3.51s, output=B_done
cmd_C: exit=0, time=3.51s, output=C_done
Total wall time: 3.51s

Each command took about 3.5 seconds (2s sleep + API overhead), but all three completed in 3.5 seconds total — versus roughly 10.5 seconds if run sequentially. Commands actually run in parallel within the same session, enabling patterns like running tests, builds, and linting simultaneously.

Use Cases

Separating tool execution from reasoning — Use InvokeAgentRuntime for LLM reasoning and InvokeAgentRuntimeCommand for deterministic ops. Test execution, builds, and git commits can be managed independently from the agent's reasoning loop.

File-based step chaining — Leverage in-session filesystem sharing: agent generates code → writes to file → runs tests → reads results → feeds back to reasoning. The shared filesystem makes this flow natural.

Parallel execution for faster feedback — Fire off tests, linting, and type checks in parallel to cut per-turn latency. Note that code configuration doesn't include tools beyond the Python runtime, so production agents should use container configuration with a custom Dockerfile.

Notes on the Official Best Practices

The documentation's best practices recommend && chaining for state encoding and incremental streaming output processing. Here are supplementary notes based on the verification results.

&& chaining is only needed for shell state — The best practices recommend patterns like cd /workspace && export NODE_ENV=test && npm test. This is correct when you need environment variable or working directory carryover. But for file-based data passing, separate API calls can reference files without && chaining, as verified in this test.
Short timeouts with streaming output work well — The best practices suggest 5 minutes for test suites and 30 seconds for git push. Since partial output streams before timeout (verified), setting shorter timeouts and processing output incrementally to detect failures early is a viable strategy.
exitCode checking is essential — pay attention to the values — The best practices recommend checking exitCode. The specific values found in testing: command-specific exit codes on failure (e.g., 2 for ls not found), and -1 for timeout. Build this distinction into your error handling.

Code Interpreter vs InvokeAgentRuntimeCommand

AgentCore Runtime also offers a Code Interpreter. Despite the similar naming, the two serve distinct roles.

Aspect	InvokeAgentRuntimeCommand	Code Interpreter
Purpose	Deterministic ops: tests, builds, git	Data analysis, computation, visualization
Who decides the command	Caller specifies the exact command	AI agent writes and executes code
Execution environment	Same container as the agent	Isolated sandbox container
Supported languages	Any shell command (bash)	Python, JavaScript, TypeScript
Max execution time	1 hour	Up to 8 hours

The decision comes down to who decides what to run and where it runs. Use InvokeAgentRuntimeCommand when the command is known upfront (npm test, git operations). Use Code Interpreter when the agent needs to write code based on its reasoning to perform calculations or data processing.

Takeaways

The verification revealed runtime behaviors not obvious from the documentation alone.

"Stateless" only applies to shell process state — The docs say "Stateless between commands" but also "same container, filesystem." Testing confirmed that /tmp/ files persisted across API calls within a session. Only shell history and environment variables are reset.
Timeout is explicitly signaled with exit code -1 — Command failure (non-zero exit code + COMPLETED) and timeout (exit code -1 + TIMED_OUT) are cleanly separated on two axes, and partial output is still streamed before timeout.
Concurrent execution is verified — The docs state that command execution doesn't block agent invocations, but testing confirmed that multiple commands in the same session also run in parallel. Three commands completed in the wall time of one.
Code configuration containers lack common tools beyond Python — git, curl, node, pip, and AWS CLI were not available. Choose container configuration when your agent needs a full toolchain.

Cleanup

Delete resources in reverse dependency order: endpoint → runtime → S3 → IAM.

Terminal

# Delete endpoint
aws bedrock-agentcore-control delete-agent-runtime-endpoint \
  --region "$REGION" \
  --agent-runtime-id "$RUNTIME_ID" \
  --endpoint-name shell_cmd_test_endpoint
 
# Delete runtime
aws bedrock-agentcore-control delete-agent-runtime \
  --region "$REGION" \
  --agent-runtime-id "$RUNTIME_ID"
 
# Delete S3 bucket and contents
aws s3 rb "s3://${BUCKET_NAME}" --force --region "$REGION"
 
# Delete IAM role (detach policies first)
aws iam delete-role-policy --role-name AgentCoreRuntimeTestRole --policy-name S3Access
aws iam detach-role-policy --role-name AgentCoreRuntimeTestRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess
aws iam delete-role --role-name AgentCoreRuntimeTestRole

Execute Shell Commands in Bedrock AgentCore Runtime Sessions with InvokeAgentRuntimeCommand

Introduction

Why Separate Shell Commands from Agent Reasoning

API Design

Setting Up the Test Environment

Agent Code

Deployment

Verification Results

1. Basic Execution and Streaming

2. Container Environment

3. In-Session File Persistence

4. Cross-Session microVM Isolation

5. Error Handling

6. Timeout Behavior

7. Concurrent Execution

Use Cases

Notes on the Official Best Practices

Code Interpreter vs InvokeAgentRuntimeCommand

Takeaways

Cleanup

Related Posts

Strands Agents SDK Deploy — Managed Deployment with AgentCore CLI

Edge Cases to Verify Before Using Stateful MCP in Production

Agentic AI security starts with treating agents like employees: Unpacking the 7 principles for financial services