Measuring Lambda Managed Instances Provisioning Time, Multi-Concurrency, and CPU-Based Scaling

Introduction

On November 30, 2025, AWS announced Lambda Managed Instances (LMI). This new compute option lets you run Lambda functions on EC2 instances while preserving the Lambda programming model.

LMI is less like "Lambda that scales instantly" and more like "Fargate with a Lambda developer experience." It eliminates cold starts but introduces provisioning wait times at deploy, and scaling is asynchronous based on CPU utilization. This mental model shift is the starting point for using LMI effectively.

This article deploys LMI and measures three core behaviors, then provides a checklist for deciding whether to migrate your workload. See the official docs at Lambda Managed Instances and the AWS Compute Blog post for configuration details.

Key Differences from Standard Lambda

Aspect	Standard Lambda	Lambda Managed Instances
Concurrency	1 execution env = 1 request	1 execution env = N requests (multi-concurrency)
Scaling	Request-driven (immediate)	CPU utilization-based (async)
Cold starts	Yes	No (provisioned at publish-version)
Pricing	Per-request + duration	EC2 instance + 15% management fee
VPC	Optional	Required (specified in Capacity Provider)
Min memory	128 MB	2 GB

Prerequisites:

AWS CLI configured (Lambda, EC2, IAM permissions)
Test region: us-east-1

Skip to Verification 1 if you only want the results.

Environment Setup

IAM roles, VPC, and Capacity Provider creation steps

LMI requires two IAM roles: a Lambda execution role and a Capacity Provider operator role that allows Lambda to manage EC2 instances.

Terminal (IAM roles)

# Lambda execution role
cat > lambda-trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "lambda.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF
 
aws iam create-role \
  --role-name LMI-Verification-ExecutionRole \
  --assume-role-policy-document file://lambda-trust-policy.json
 
aws iam attach-role-policy \
  --role-name LMI-Verification-ExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
 
# Capacity Provider operator role
aws iam create-role \
  --role-name LMI-Verification-OperatorRole \
  --assume-role-policy-document file://lambda-trust-policy.json
 
aws iam attach-role-policy \
  --role-name LMI-Verification-OperatorRole \
  --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator

LMI requires a VPC with subnets in at least 3 AZs. A NAT Gateway is needed for CloudWatch Logs egress.

Terminal (VPC resources)

REGION=us-east-1
 
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
  --query 'Vpc.VpcId' --output text --region $REGION)
 
# Private subnets (3 AZs)
PRIV_SUB1=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB2=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.2.0/24 --availability-zone us-east-1b \
  --query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB3=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.3.0/24 --availability-zone us-east-1c \
  --query 'Subnet.SubnetId' --output text --region $REGION)
 
# Public subnet + IGW + NAT Gateway
PUB_SUB=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.100.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text --region $REGION)
 
IGW_ID=$(aws ec2 create-internet-gateway \
  --query 'InternetGateway.InternetGatewayId' --output text --region $REGION)
aws ec2 attach-internet-gateway \
  --internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
 
PUB_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PUB_RT \
  --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID --region $REGION
aws ec2 associate-route-table \
  --route-table-id $PUB_RT --subnet-id $PUB_SUB --region $REGION
 
EIP_ALLOC=$(aws ec2 allocate-address --domain vpc \
  --query 'AllocationId' --output text --region $REGION)
NAT_GW=$(aws ec2 create-nat-gateway --subnet-id $PUB_SUB \
  --allocation-id $EIP_ALLOC \
  --query 'NatGateway.NatGatewayId' --output text --region $REGION)
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT_GW --region $REGION
 
PRIV_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PRIV_RT \
  --destination-cidr-block 0.0.0.0/0 --nat-gateway-id $NAT_GW --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3; do
  aws ec2 associate-route-table \
    --route-table-id $PRIV_RT --subnet-id $SUB --region $REGION
done
 
SG_ID=$(aws ec2 create-security-group --group-name lmi-verification-sg \
  --description "Security group for LMI verification" \
  --vpc-id $VPC_ID --query 'GroupId' --output text --region $REGION)

Terminal (Capacity Provider)

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
 
aws lambda create-capacity-provider \
  --capacity-provider-name lmi-verification-cp \
  --vpc-config "SubnetIds=$PRIV_SUB1,$PRIV_SUB2,$PRIV_SUB3,SecurityGroupIds=$SG_ID" \
  --permissions-config "CapacityProviderOperatorRoleArn=arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-OperatorRole" \
  --instance-requirements "Architectures=x86_64" \
  --capacity-provider-scaling-config "MaxVCpuCount=30" \
  --region $REGION

The verification function supports three modes via the mode parameter: info (basic info), io_bound (I/O simulation), and cpu_bound (CPU load).

lambda_function.py

import json
import os
import time
import threading
import math
 
def lambda_handler(event, context):
    mode = event.get("mode", "info")
    result = {
        "request_id": context.aws_request_id,
        "function_version": context.function_version,
        "pid": os.getpid(),
        "thread_id": threading.current_thread().ident,
        "timestamp": time.time(),
    }
 
    if mode == "io_bound":
        sleep_sec = event.get("sleep_sec", 2)
        time.sleep(sleep_sec)
        result["mode"] = "io_bound"
        result["sleep_sec"] = sleep_sec
    elif mode == "cpu_bound":
        iterations = event.get("iterations", 5_000_000)
        start = time.time()
        total = 0.0
        for i in range(iterations):
            total += math.sqrt(i) * math.sin(i)
        result["mode"] = "cpu_bound"
        result["compute_time_sec"] = round(time.time() - start, 3)
    else:
        result["mode"] = "info"
 
    return {"statusCode": 200, "body": json.dumps(result)}

Terminal (Lambda function + publish-version)

mkdir -p /tmp/lmi-func && cp lambda_function.py /tmp/lmi-func/
cd /tmp/lmi-func && zip -j /tmp/lmi-function.zip lambda_function.py
 
CP_ARN="arn:aws:lambda:${REGION}:${ACCOUNT_ID}:capacity-provider:lmi-verification-cp"
 
aws lambda create-function \
  --function-name lmi-verification-func \
  --runtime python3.13 \
  --role "arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-ExecutionRole" \
  --handler lambda_function.lambda_handler \
  --zip-file fileb:///tmp/lmi-function.zip \
  --memory-size 4096 \
  --timeout 60 \
  --capacity-provider-config "{
    \"LambdaManagedInstancesCapacityProviderConfig\": {
      \"CapacityProviderArn\": \"$CP_ARN\",
      \"ExecutionEnvironmentMemoryGiBPerVCpu\": 4.0,
      \"PerExecutionEnvironmentMaxConcurrency\": 10
    }
  }" \
  --region $REGION
 
aws lambda wait function-active-v2 --function-name lmi-verification-func --region $REGION
 
# Publish version (triggers EC2 provisioning)
aws lambda publish-version --function-name lmi-verification-func --region $REGION

Configuration used for this verification:

Setting	Value
Region	us-east-1
Architecture	x86_64
MaxVCpuCount	30
Memory	4096 MB
Memory/vCPU ratio	4:1 (general purpose)
PerExecutionEnvironmentMaxConcurrency	10 (Python default is 16/vCPU; set lower to observe limit behavior)
Instance type	Lambda-selected (result: m7i.xlarge)

Verification 1: How Long Does Capacity Provider Startup Take?

Standard Lambda functions are invocable immediately after deploy. LMI provisions EC2 instances at publish-version time. I measured this lead time.

Results

Phase	Duration
Capacity Provider create API → Active	Instant (seconds)
`publish-version` API response	~1.5s
`publish-version` → version Active	~67 seconds
First invoke latency	~1.2s (no cold start)

After publish-version, Lambda launched one m7i.xlarge instance in each of 3 AZs.

Terminal

aws ec2 describe-instances \
  --filters "Name=tag:aws:lambda:capacity-provider,Values=*" \
  --query 'Reservations[*].Instances[*].[InstanceType,Placement.AvailabilityZone]' \
  --output table --region us-east-1

Output

|  m7i.xlarge |  us-east-1c |
|  m7i.xlarge |  us-east-1b |
|  m7i.xlarge |  us-east-1a |

Without specifying instance types, Lambda auto-selected m7i.xlarge (4 vCPU / 16 GB) for the 4GB memory / 4:1 ratio configuration.

First Invoke Confirmation

After the version became Active, invokes showed no cold starts. Five sequential invokes were stable at ~1.2s (including network round-trip). Since the info mode function body executes in near-zero time, most of this latency is the network round-trip from the local environment (Tokyo) to us-east-1.

Output

invoke #1: 1248ms | pid=18
invoke #2: 1288ms | pid=16
invoke #3: 1192ms | pid=22
invoke #4: 1178ms | pid=20
invoke #5: 1204ms | pid=15

Note the different PIDs — five sequential invokes were routed to five different execution environments. With three m7i.xlarge instances (4 vCPU / 16 GB each) and a function configured at 4 GB memory / 1 vCPU, each instance can host multiple execution environments. In practice, more execution environments were running than the MinExecutionEnvironments=3 floor (more on this below).

Takeaway

The 67-second provisioning time needs consideration for CI/CD pipelines. This is fundamentally different from standard Lambda's "deploy and invoke immediately." However, once provisioned, you get stable latency with no cold starts. LMI suits long-running stable functions rather than frequently deployed ones.

Verification 2: How Many Requests Can One Execution Environment Handle?

LMI's key differentiator is multi-concurrency. Standard Lambda uses one execution environment per request; LMI can process multiple requests simultaneously in one environment. I tested with I/O-bound workloads (simulated with 3-second sleep).

Note that the Python runtime implements multi-concurrency using separate processes, not threads (Node.js uses an event loop; Java/.NET use threads). This means you should focus on inter-process coordination and file locking for /tmp access rather than thread safety. Because each execution environment runs as a separate process, it has a unique PID. The following tests use PID distribution to identify which execution environment handled each request.

10 Concurrent Requests

Concurrent invoke commands

Terminal

# Run N concurrent invokes and aggregate PID distribution
N=10
for i in $(seq 1 $N); do
  aws lambda invoke \
    --function-name lmi-verification-func --qualifier 1 \
    --payload '{"mode":"io_bound","sleep_sec":3}' \
    --cli-binary-format raw-in-base64-out \
    --cli-read-timeout 30 \
    /tmp/lmi-result-${i}.json \
    --region us-east-1 > /dev/null 2>&1 &
done
wait
 
# Check PID distribution
for i in $(seq 1 $N); do
  python3 -c "
import json
d=json.load(open('/tmp/lmi-result-${i}.json'))
b=json.loads(d['body'])
print(b['pid'])" 2>/dev/null
done | sort | uniq -c | sort -rn

Change N to 10, 30, or 40 for each test. For CPU-bound tests, change the payload to '{"mode":"cpu_bound","iterations":10000000}'.

Output

All requests completed: 4551ms (3s sleep + network round-trip)
 
PID distribution (same PID = same execution environment):
  3 requests → PID 17
  2 requests → PID 16
  1 request  → PID 24, 21, 19, 18, 15

PID 17 handled 3 requests simultaneously. Multi-concurrency is working. Total completion time of ~4.5s (3s sleep + ~1.5s overhead) confirms all 10 requests were processed in parallel.

30 Concurrent Requests (Near Theoretical Limit)

With MinExecutionEnvironments=3 and PerExecutionEnvironmentMaxConcurrency=10, the minimum concurrent capacity is 3 × 10 = 30.

Output

Completed: 5899ms | Success: 30/30 | Throttled: 0
 
PID distribution:
  3 requests each → PID 15-24 (10 unique PIDs)

All 30 succeeded with zero throttling. 10 unique PIDs were observed. Three m7i.xlarge instances (4 vCPU / 16 GB each) hosted 10 execution environments (4 GB / 1 vCPU each), exceeding the MinExecutionEnvironments=3 floor. Requests were distributed evenly at 3 per environment.

40 Concurrent Requests (Over Limit)

Output

Completed: 9220ms | Success: 34/40 | Throttled: 6
PID distribution: 10 PIDs (15-24), 3-4 requests each

At 40 concurrent requests, 6 were throttled. Completion time stretched to 9.2s. While the environment-level capacity is 10 environments × 10 concurrency = 100, throttling occurred at just 40 concurrent requests. The exact cause is unclear, but routing and queuing overhead may be a contributing factor.

Takeaway

Multi-concurrency works effectively for I/O-bound workloads. A 3-second sleep operation completes in ~4.5s even with 10 concurrent requests. What would require 10 execution environments in standard Lambda can be handled with fewer resources.

From a cost perspective, note that 10 execution environments were running despite MinExecutionEnvironments=3. The actual number of execution environments is not determined by MinExecutionEnvironments alone. Cost estimates should be based on the number and type of EC2 instances launched.

Verification 3: How Fast Does Scale-Out Respond to CPU Load?

LMI scales asynchronously based on CPU utilization. Scaling operates at two layers: adding execution environments on existing instances, and launching additional EC2 instances when instance resources are exhausted. I tested how this differs from standard Lambda's immediate request-driven scaling using CPU-bound workloads.

CPU-Bound Function

A numerical computation function (sqrt + sin × 10 million iterations) taking ~1 second of CPU time per invocation.

50 Concurrent × 3 Batches (15s Intervals)

First, I applied heavy load under the current configuration (MaxExecutionEnvironments not explicitly set) to observe whether execution environments or instances are added automatically.

Output

Batch 1: 18543ms | Success: 33/50 | Throttled: 17 | Unique PIDs: 10
Batch 2: 15212ms | Success: 20/50 | Throttled: 30 | Unique PIDs: 10
Batch 3: 19075ms | Success: 18/50 | Throttled: 32 | Unique PIDs: 10

CPU-bound workloads produced heavy throttling, and no increase in execution environments was observed even with 30-second intervals between batches. PID count remained at 10 throughout. CloudTrail RunInstances events confirmed that no additional EC2 instances were launched during the CPU load test either — the initial 3 instances remained throughout.

After explicitly setting MaxExecutionEnvironments=20 to give Lambda room to add execution environments, PID 25 appeared (11 unique PIDs). Execution environment addition had begun, but was not immediate.

Terminal (scaling config update)

aws lambda put-function-scaling-config \
  --function-name lmi-verification-func --qualifier 1 \
  --function-scaling-config "MinExecutionEnvironments=3,MaxExecutionEnvironments=20" \
  --region us-east-1

30 Concurrent × 5 Batches (Stability Test)

Since 50 concurrent requests caused heavy throttling, I reduced to a level the current 10 execution environments could handle stably, and observed behavior under sustained load:

Output

Batch 1: 18102ms | Success: 29/30 | Throttled: 1
Batch 2: 18838ms | Success: 30/30 | Throttled: 0
Batch 3: 17905ms | Success: 30/30 | Throttled: 0
Batch 4: 15101ms | Success: 30/30 | Throttled: 0
Batch 5: 12235ms | Success: 30/30 | Throttled: 0

10 execution environments handled 30 concurrent requests stably, with processing time decreasing per batch (18s → 12s). The exact cause of the improvement is unclear, but it may be due to amortization of initial costs within each execution environment, such as process initialization and module imports.

CloudWatch Metrics

Output (Throttles)

15:57  51
15:58  186 (peak during 50-concurrent CPU test)
15:59  96
16:00  3
16:01  0

Takeaway

CPU-bound workloads can feel "slow to scale" with LMI. The docs note that throttling may occur if traffic more than doubles within 5 minutes, and this was clearly observable in practice. The gap from standard Lambda's request-driven scaling is significant.

For CPU-bound work, multi-concurrency benefits are limited since invocations share CPU resources within the same environment, which can increase individual request latency. The official blog recommends setting concurrency at or below vCPU count for CPU-intensive workloads.

LMI Suitability Analysis

Performance Comparison

Aspect	Standard Lambda	LMI (Measured)
Deploy → invocable	Instant	~67 seconds
Cold starts	Yes (100ms-seconds)	None
I/O-bound 10 concurrent	10 execution envs needed	4.5s completion (multi-concurrency)
CPU-bound 50 concurrent	50 envs, immediate scale	Throttling (async scaling)
Scale-out speed	Per-request, immediate	CPU-based, async (two layers: execution envs + instances)

Cost Structure

LMI uses EC2 instance pricing + 15% management fee. This verification launched m7i.xlarge ($0.2016/hr × 3 instances).

LMI minimum cost: 3 × $0.2016 × 1.15 ≈ **$ 0.70/hour** (minimum 3 EC2 instances always running)
Standard Lambda: $0 with no traffic

LMI incurs costs even when idle. High-utilization workloads may benefit from EC2 Savings Plans / Reserved Instances, but low-utilization workloads are far cheaper on standard Lambda.

Summary — LMI Migration Checklist

Check	Standard Lambda Fits	LMI Fits
Traffic pattern	Bursty, spiky	Steady, predictable
Cold start tolerance	Can tolerate	Cannot tolerate
Execution duration	Short, event-driven	Long-running, steady-state
Workload type	CPU-bound (scaling challenges)	I/O-bound (multi-concurrency shines)
Utilization rate	Low (per-request pricing wins)	High (EC2 pricing advantage)
VPC requirements	No VPC needed	Already in VPC
Deploy frequency	Frequent (instant deploy needed)	Infrequent (67s wait acceptable)

"Managed EC2 cluster" fit check — LMI provides EC2 compute flexibility with Lambda's developer experience, but its scaling model is closer to EC2. If you need instant response to traffic spikes, standard Lambda is better suited
Multi-concurrency shines for I/O-bound work — Processing multiple requests in one execution environment dramatically improves resource efficiency for workloads with API calls or DB query wait times. Benefits are limited for CPU-bound work
Cost optimization requires utilization analysis — With at least 3 EC2 instances always running, cost efficiency at low traffic is worse than standard Lambda. Consider LMI for high-utilization workloads where Savings Plans apply

Cleanup

Resource deletion commands

Terminal

REGION=us-east-1
 
# Delete Lambda function and versions
aws lambda delete-function --function-name lmi-verification-func --region $REGION
 
# Delete Capacity Provider (auto-terminates EC2 instances)
aws lambda delete-capacity-provider \
  --capacity-provider-name lmi-verification-cp --region $REGION
 
# Delete NAT Gateway (takes time)
aws ec2 delete-nat-gateway --nat-gateway-id $NAT_GW --region $REGION
aws ec2 wait nat-gateway-deleted --nat-gateway-ids $NAT_GW --region $REGION
aws ec2 release-address --allocation-id $EIP_ALLOC --region $REGION
 
# Delete route tables, subnets, IGW, VPC
aws ec2 delete-route-table --route-table-id $PRIV_RT --region $REGION
aws ec2 delete-route-table --route-table-id $PUB_RT --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3 $PUB_SUB; do
  aws ec2 delete-subnet --subnet-id $SUB --region $REGION
done
aws ec2 detach-internet-gateway \
  --internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
aws ec2 delete-internet-gateway --internet-gateway-id $IGW_ID --region $REGION
aws ec2 delete-security-group --group-id $SG_ID --region $REGION
aws ec2 delete-vpc --vpc-id $VPC_ID --region $REGION
 
# Delete IAM roles
aws iam detach-role-policy --role-name LMI-Verification-ExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam delete-role --role-name LMI-Verification-ExecutionRole
aws iam detach-role-policy --role-name LMI-Verification-OperatorRole \
  --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator
aws iam delete-role --role-name LMI-Verification-OperatorRole

Measuring Lambda Managed Instances Provisioning Time, Multi-Concurrency, and CPU-Based Scaling

Introduction

Key Differences from Standard Lambda

Environment Setup

Verification 1: How Long Does Capacity Provider Startup Take?

Results

First Invoke Confirmation

Takeaway

Verification 2: How Many Requests Can One Execution Environment Handle?

10 Concurrent Requests

30 Concurrent Requests (Near Theoretical Limit)

40 Concurrent Requests (Over Limit)

Takeaway

Verification 3: How Fast Does Scale-Out Respond to CPU Load?

CPU-Bound Function

50 Concurrent × 3 Batches (15s Intervals)

30 Concurrent × 5 Batches (Stability Test)

CloudWatch Metrics

Takeaway

LMI Suitability Analysis

Performance Comparison

Cost Structure

Summary — LMI Migration Checklist

Cleanup

Related Posts

Lambda Rust — Zero Cold Starts with Managed Instances Multi-Concurrency

Lambda Rust — Benchmarking Official Support Cold Start and Execution Speed vs Python

Deploy, test, and operations insights for the Lambda Durable Functions fraud detection demo