Verifying Lambda Managed Instances — Provisioning Time, Multi-Concurrency, and Scaling Behavior
Table of Contents
Introduction
On November 30, 2025, AWS announced Lambda Managed Instances (LMI). This new compute option lets you run Lambda functions on EC2 instances while preserving the Lambda programming model.
LMI is less like "Lambda that scales instantly" and more like "Fargate with a Lambda developer experience." It eliminates cold starts but introduces provisioning wait times at deploy, and scaling is asynchronous based on CPU utilization. This mental model shift is the starting point for using LMI effectively.
This article deploys LMI and measures three core behaviors, then provides a checklist for deciding whether to migrate your workload. See the official docs at Lambda Managed Instances and the AWS Compute Blog post for configuration details.
Key Differences from Standard Lambda
| Aspect | Standard Lambda | Lambda Managed Instances |
|---|---|---|
| Concurrency | 1 execution env = 1 request | 1 execution env = N requests (multi-concurrency) |
| Scaling | Request-driven (immediate) | CPU utilization-based (async) |
| Cold starts | Yes | No (provisioned at publish-version) |
| Pricing | Per-request + duration | EC2 instance + 15% management fee |
| VPC | Optional | Required (specified in Capacity Provider) |
| Min memory | 128 MB | 2 GB |
Prerequisites:
- AWS CLI configured (Lambda, EC2, IAM permissions)
- Test region: us-east-1
Skip to Verification 1 if you only want the results.
Environment Setup
IAM roles, VPC, and Capacity Provider creation steps
LMI requires two IAM roles: a Lambda execution role and a Capacity Provider operator role that allows Lambda to manage EC2 instances.
# Lambda execution role
cat > lambda-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "lambda.amazonaws.com" },
"Action": "sts:AssumeRole"
}]
}
EOF
aws iam create-role \
--role-name LMI-Verification-ExecutionRole \
--assume-role-policy-document file://lambda-trust-policy.json
aws iam attach-role-policy \
--role-name LMI-Verification-ExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
# Capacity Provider operator role
aws iam create-role \
--role-name LMI-Verification-OperatorRole \
--assume-role-policy-document file://lambda-trust-policy.json
aws iam attach-role-policy \
--role-name LMI-Verification-OperatorRole \
--policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperatorLMI requires a VPC with subnets in at least 3 AZs. A NAT Gateway is needed for CloudWatch Logs egress.
REGION=us-east-1
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
--query 'Vpc.VpcId' --output text --region $REGION)
# Private subnets (3 AZs)
PRIV_SUB1=$(aws ec2 create-subnet --vpc-id $VPC_ID \
--cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
--query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB2=$(aws ec2 create-subnet --vpc-id $VPC_ID \
--cidr-block 10.0.2.0/24 --availability-zone us-east-1b \
--query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB3=$(aws ec2 create-subnet --vpc-id $VPC_ID \
--cidr-block 10.0.3.0/24 --availability-zone us-east-1c \
--query 'Subnet.SubnetId' --output text --region $REGION)
# Public subnet + IGW + NAT Gateway
PUB_SUB=$(aws ec2 create-subnet --vpc-id $VPC_ID \
--cidr-block 10.0.100.0/24 --availability-zone us-east-1a \
--query 'Subnet.SubnetId' --output text --region $REGION)
IGW_ID=$(aws ec2 create-internet-gateway \
--query 'InternetGateway.InternetGatewayId' --output text --region $REGION)
aws ec2 attach-internet-gateway \
--internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
PUB_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
--query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PUB_RT \
--destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID --region $REGION
aws ec2 associate-route-table \
--route-table-id $PUB_RT --subnet-id $PUB_SUB --region $REGION
EIP_ALLOC=$(aws ec2 allocate-address --domain vpc \
--query 'AllocationId' --output text --region $REGION)
NAT_GW=$(aws ec2 create-nat-gateway --subnet-id $PUB_SUB \
--allocation-id $EIP_ALLOC \
--query 'NatGateway.NatGatewayId' --output text --region $REGION)
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT_GW --region $REGION
PRIV_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
--query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PRIV_RT \
--destination-cidr-block 0.0.0.0/0 --nat-gateway-id $NAT_GW --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3; do
aws ec2 associate-route-table \
--route-table-id $PRIV_RT --subnet-id $SUB --region $REGION
done
SG_ID=$(aws ec2 create-security-group --group-name lmi-verification-sg \
--description "Security group for LMI verification" \
--vpc-id $VPC_ID --query 'GroupId' --output text --region $REGION)ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
aws lambda create-capacity-provider \
--capacity-provider-name lmi-verification-cp \
--vpc-config "SubnetIds=$PRIV_SUB1,$PRIV_SUB2,$PRIV_SUB3,SecurityGroupIds=$SG_ID" \
--permissions-config "CapacityProviderOperatorRoleArn=arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-OperatorRole" \
--instance-requirements "Architectures=x86_64" \
--capacity-provider-scaling-config "MaxVCpuCount=30" \
--region $REGIONThe verification function supports three modes via the mode parameter: info (basic info), io_bound (I/O simulation), and cpu_bound (CPU load).
import json
import os
import time
import threading
import math
def lambda_handler(event, context):
mode = event.get("mode", "info")
result = {
"request_id": context.aws_request_id,
"function_version": context.function_version,
"pid": os.getpid(),
"thread_id": threading.current_thread().ident,
"timestamp": time.time(),
}
if mode == "io_bound":
sleep_sec = event.get("sleep_sec", 2)
time.sleep(sleep_sec)
result["mode"] = "io_bound"
result["sleep_sec"] = sleep_sec
elif mode == "cpu_bound":
iterations = event.get("iterations", 5_000_000)
start = time.time()
total = 0.0
for i in range(iterations):
total += math.sqrt(i) * math.sin(i)
result["mode"] = "cpu_bound"
result["compute_time_sec"] = round(time.time() - start, 3)
else:
result["mode"] = "info"
return {"statusCode": 200, "body": json.dumps(result)}mkdir -p /tmp/lmi-func && cp lambda_function.py /tmp/lmi-func/
cd /tmp/lmi-func && zip -j /tmp/lmi-function.zip lambda_function.py
CP_ARN="arn:aws:lambda:${REGION}:${ACCOUNT_ID}:capacity-provider:lmi-verification-cp"
aws lambda create-function \
--function-name lmi-verification-func \
--runtime python3.13 \
--role "arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-ExecutionRole" \
--handler lambda_function.lambda_handler \
--zip-file fileb:///tmp/lmi-function.zip \
--memory-size 4096 \
--timeout 60 \
--capacity-provider-config "{
\"LambdaManagedInstancesCapacityProviderConfig\": {
\"CapacityProviderArn\": \"$CP_ARN\",
\"ExecutionEnvironmentMemoryGiBPerVCpu\": 4.0,
\"PerExecutionEnvironmentMaxConcurrency\": 10
}
}" \
--region $REGION
aws lambda wait function-active-v2 --function-name lmi-verification-func --region $REGION
# Publish version (triggers EC2 provisioning)
aws lambda publish-version --function-name lmi-verification-func --region $REGIONConfiguration used for this verification:
| Setting | Value |
|---|---|
| Region | us-east-1 |
| Architecture | x86_64 |
| MaxVCpuCount | 30 |
| Memory | 4096 MB |
| Memory/vCPU ratio | 4:1 (general purpose) |
| PerExecutionEnvironmentMaxConcurrency | 10 (Python default is 16/vCPU; set lower to observe limit behavior) |
| Instance type | Lambda-selected (result: m7i.xlarge) |
Verification 1: How Long Does Capacity Provider Startup Take?
Standard Lambda functions are invocable immediately after deploy. LMI provisions EC2 instances at publish-version time. I measured this lead time.
Results
| Phase | Duration |
|---|---|
| Capacity Provider create API → Active | Instant (seconds) |
publish-version API response | ~1.5s |
publish-version → version Active | ~67 seconds |
| First invoke latency | ~1.2s (no cold start) |
After publish-version, Lambda launched one m7i.xlarge instance in each of 3 AZs.
aws ec2 describe-instances \
--filters "Name=tag:aws:lambda:capacity-provider,Values=*" \
--query 'Reservations[*].Instances[*].[InstanceType,Placement.AvailabilityZone]' \
--output table --region us-east-1| m7i.xlarge | us-east-1c |
| m7i.xlarge | us-east-1b |
| m7i.xlarge | us-east-1a |Without specifying instance types, Lambda auto-selected m7i.xlarge (4 vCPU / 16 GB) for the 4GB memory / 4:1 ratio configuration.
First Invoke Confirmation
After the version became Active, invokes showed no cold starts. Five sequential invokes were stable at ~1.2s (including network round-trip). Since the info mode function body executes in near-zero time, most of this latency is the network round-trip from the local environment (Tokyo) to us-east-1.
invoke #1: 1248ms | pid=18
invoke #2: 1288ms | pid=16
invoke #3: 1192ms | pid=22
invoke #4: 1178ms | pid=20
invoke #5: 1204ms | pid=15Note the different PIDs — five sequential invokes were routed to five different execution environments. With three m7i.xlarge instances (4 vCPU / 16 GB each) and a function configured at 4 GB memory / 1 vCPU, each instance can host multiple execution environments. In practice, more execution environments were running than the MinExecutionEnvironments=3 floor (more on this below).
Takeaway
The 67-second provisioning time needs consideration for CI/CD pipelines. This is fundamentally different from standard Lambda's "deploy and invoke immediately." However, once provisioned, you get stable latency with no cold starts. LMI suits long-running stable functions rather than frequently deployed ones.
Verification 2: How Many Requests Can One Execution Environment Handle?
LMI's key differentiator is multi-concurrency. Standard Lambda uses one execution environment per request; LMI can process multiple requests simultaneously in one environment. I tested with I/O-bound workloads (simulated with 3-second sleep).
Note that the Python runtime implements multi-concurrency using separate processes, not threads (Node.js uses an event loop; Java/.NET use threads). This means you should focus on inter-process coordination and file locking for /tmp access rather than thread safety. Because each execution environment runs as a separate process, it has a unique PID. The following tests use PID distribution to identify which execution environment handled each request.
10 Concurrent Requests
Concurrent invoke commands
# Run N concurrent invokes and aggregate PID distribution
N=10
for i in $(seq 1 $N); do
aws lambda invoke \
--function-name lmi-verification-func --qualifier 1 \
--payload '{"mode":"io_bound","sleep_sec":3}' \
--cli-binary-format raw-in-base64-out \
--cli-read-timeout 30 \
/tmp/lmi-result-${i}.json \
--region us-east-1 > /dev/null 2>&1 &
done
wait
# Check PID distribution
for i in $(seq 1 $N); do
python3 -c "
import json
d=json.load(open('/tmp/lmi-result-${i}.json'))
b=json.loads(d['body'])
print(b['pid'])" 2>/dev/null
done | sort | uniq -c | sort -rnChange N to 10, 30, or 40 for each test. For CPU-bound tests, change the payload to '{"mode":"cpu_bound","iterations":10000000}'.
All requests completed: 4551ms (3s sleep + network round-trip)
PID distribution (same PID = same execution environment):
3 requests → PID 17
2 requests → PID 16
1 request → PID 24, 21, 19, 18, 15PID 17 handled 3 requests simultaneously. Multi-concurrency is working. Total completion time of ~4.5s (3s sleep + ~1.5s overhead) confirms all 10 requests were processed in parallel.
30 Concurrent Requests (Near Theoretical Limit)
With MinExecutionEnvironments=3 and PerExecutionEnvironmentMaxConcurrency=10, the minimum concurrent capacity is 3 × 10 = 30.
Completed: 5899ms | Success: 30/30 | Throttled: 0
PID distribution:
3 requests each → PID 15-24 (10 unique PIDs)All 30 succeeded with zero throttling. 10 unique PIDs were observed. Three m7i.xlarge instances (4 vCPU / 16 GB each) hosted 10 execution environments (4 GB / 1 vCPU each), exceeding the MinExecutionEnvironments=3 floor. Requests were distributed evenly at 3 per environment.
40 Concurrent Requests (Over Limit)
Completed: 9220ms | Success: 34/40 | Throttled: 6
PID distribution: 10 PIDs (15-24), 3-4 requests eachAt 40 concurrent requests, 6 were throttled. Completion time stretched to 9.2s. While the environment-level capacity is 10 environments × 10 concurrency = 100, throttling occurred at just 40 concurrent requests. The exact cause is unclear, but routing and queuing overhead may be a contributing factor.
Takeaway
Multi-concurrency works effectively for I/O-bound workloads. A 3-second sleep operation completes in ~4.5s even with 10 concurrent requests. What would require 10 execution environments in standard Lambda can be handled with fewer resources.
From a cost perspective, note that 10 execution environments were running despite MinExecutionEnvironments=3. The actual number of execution environments is not determined by MinExecutionEnvironments alone. Cost estimates should be based on the number and type of EC2 instances launched.
Verification 3: How Fast Does Scale-Out Respond to CPU Load?
LMI scales asynchronously based on CPU utilization. Scaling operates at two layers: adding execution environments on existing instances, and launching additional EC2 instances when instance resources are exhausted. I tested how this differs from standard Lambda's immediate request-driven scaling using CPU-bound workloads.
CPU-Bound Function
A numerical computation function (sqrt + sin × 10 million iterations) taking ~1 second of CPU time per invocation.
50 Concurrent × 3 Batches (15s Intervals)
First, I applied heavy load under the current configuration (MaxExecutionEnvironments not explicitly set) to observe whether execution environments or instances are added automatically.
Batch 1: 18543ms | Success: 33/50 | Throttled: 17 | Unique PIDs: 10
Batch 2: 15212ms | Success: 20/50 | Throttled: 30 | Unique PIDs: 10
Batch 3: 19075ms | Success: 18/50 | Throttled: 32 | Unique PIDs: 10CPU-bound workloads produced heavy throttling, and no increase in execution environments was observed even with 30-second intervals between batches. PID count remained at 10 throughout. CloudTrail RunInstances events confirmed that no additional EC2 instances were launched during the CPU load test either — the initial 3 instances remained throughout.
After explicitly setting MaxExecutionEnvironments=20 to give Lambda room to add execution environments, PID 25 appeared (11 unique PIDs). Execution environment addition had begun, but was not immediate.
aws lambda put-function-scaling-config \
--function-name lmi-verification-func --qualifier 1 \
--function-scaling-config "MinExecutionEnvironments=3,MaxExecutionEnvironments=20" \
--region us-east-130 Concurrent × 5 Batches (Stability Test)
Since 50 concurrent requests caused heavy throttling, I reduced to a level the current 10 execution environments could handle stably, and observed behavior under sustained load:
Batch 1: 18102ms | Success: 29/30 | Throttled: 1
Batch 2: 18838ms | Success: 30/30 | Throttled: 0
Batch 3: 17905ms | Success: 30/30 | Throttled: 0
Batch 4: 15101ms | Success: 30/30 | Throttled: 0
Batch 5: 12235ms | Success: 30/30 | Throttled: 010 execution environments handled 30 concurrent requests stably, with processing time decreasing per batch (18s → 12s). The exact cause of the improvement is unclear, but it may be due to amortization of initial costs within each execution environment, such as process initialization and module imports.
CloudWatch Metrics
15:57 51
15:58 186 (peak during 50-concurrent CPU test)
15:59 96
16:00 3
16:01 0Takeaway
CPU-bound workloads can feel "slow to scale" with LMI. The docs note that throttling may occur if traffic more than doubles within 5 minutes, and this was clearly observable in practice. The gap from standard Lambda's request-driven scaling is significant.
For CPU-bound work, multi-concurrency benefits are limited since invocations share CPU resources within the same environment, which can increase individual request latency. The official blog recommends setting concurrency at or below vCPU count for CPU-intensive workloads.
LMI Suitability Analysis
Performance Comparison
| Aspect | Standard Lambda | LMI (Measured) |
|---|---|---|
| Deploy → invocable | Instant | ~67 seconds |
| Cold starts | Yes (100ms-seconds) | None |
| I/O-bound 10 concurrent | 10 execution envs needed | 4.5s completion (multi-concurrency) |
| CPU-bound 50 concurrent | 50 envs, immediate scale | Throttling (async scaling) |
| Scale-out speed | Per-request, immediate | CPU-based, async (two layers: execution envs + instances) |
Cost Structure
LMI uses EC2 instance pricing + 15% management fee. This verification launched m7i.xlarge ($0.2016/hr × 3 instances).
- LMI minimum cost: 3 × 0.70/hour** (minimum 3 EC2 instances always running)
- Standard Lambda: $0 with no traffic
LMI incurs costs even when idle. High-utilization workloads may benefit from EC2 Savings Plans / Reserved Instances, but low-utilization workloads are far cheaper on standard Lambda.
Summary — LMI Migration Checklist
| Check | Standard Lambda Fits | LMI Fits |
|---|---|---|
| Traffic pattern | Bursty, spiky | Steady, predictable |
| Cold start tolerance | Can tolerate | Cannot tolerate |
| Execution duration | Short, event-driven | Long-running, steady-state |
| Workload type | CPU-bound (scaling challenges) | I/O-bound (multi-concurrency shines) |
| Utilization rate | Low (per-request pricing wins) | High (EC2 pricing advantage) |
| VPC requirements | No VPC needed | Already in VPC |
| Deploy frequency | Frequent (instant deploy needed) | Infrequent (67s wait acceptable) |
- "Managed EC2 cluster" fit check — LMI provides EC2 compute flexibility with Lambda's developer experience, but its scaling model is closer to EC2. If you need instant response to traffic spikes, standard Lambda is better suited
- Multi-concurrency shines for I/O-bound work — Processing multiple requests in one execution environment dramatically improves resource efficiency for workloads with API calls or DB query wait times. Benefits are limited for CPU-bound work
- Cost optimization requires utilization analysis — With at least 3 EC2 instances always running, cost efficiency at low traffic is worse than standard Lambda. Consider LMI for high-utilization workloads where Savings Plans apply
Cleanup
Resource deletion commands
REGION=us-east-1
# Delete Lambda function and versions
aws lambda delete-function --function-name lmi-verification-func --region $REGION
# Delete Capacity Provider (auto-terminates EC2 instances)
aws lambda delete-capacity-provider \
--capacity-provider-name lmi-verification-cp --region $REGION
# Delete NAT Gateway (takes time)
aws ec2 delete-nat-gateway --nat-gateway-id $NAT_GW --region $REGION
aws ec2 wait nat-gateway-deleted --nat-gateway-ids $NAT_GW --region $REGION
aws ec2 release-address --allocation-id $EIP_ALLOC --region $REGION
# Delete route tables, subnets, IGW, VPC
aws ec2 delete-route-table --route-table-id $PRIV_RT --region $REGION
aws ec2 delete-route-table --route-table-id $PUB_RT --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3 $PUB_SUB; do
aws ec2 delete-subnet --subnet-id $SUB --region $REGION
done
aws ec2 detach-internet-gateway \
--internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
aws ec2 delete-internet-gateway --internet-gateway-id $IGW_ID --region $REGION
aws ec2 delete-security-group --group-id $SG_ID --region $REGION
aws ec2 delete-vpc --vpc-id $VPC_ID --region $REGION
# Delete IAM roles
aws iam detach-role-policy --role-name LMI-Verification-ExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam delete-role --role-name LMI-Verification-ExecutionRole
aws iam detach-role-policy --role-name LMI-Verification-OperatorRole \
--policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator
aws iam delete-role --role-name LMI-Verification-OperatorRole