@shinyaz

Verifying Lambda Managed Instances — Provisioning Time, Multi-Concurrency, and Scaling Behavior

Table of Contents

Introduction

On November 30, 2025, AWS announced Lambda Managed Instances (LMI). This new compute option lets you run Lambda functions on EC2 instances while preserving the Lambda programming model.

LMI is less like "Lambda that scales instantly" and more like "Fargate with a Lambda developer experience." It eliminates cold starts but introduces provisioning wait times at deploy, and scaling is asynchronous based on CPU utilization. This mental model shift is the starting point for using LMI effectively.

This article deploys LMI and measures three core behaviors, then provides a checklist for deciding whether to migrate your workload. See the official docs at Lambda Managed Instances and the AWS Compute Blog post for configuration details.

Key Differences from Standard Lambda

AspectStandard LambdaLambda Managed Instances
Concurrency1 execution env = 1 request1 execution env = N requests (multi-concurrency)
ScalingRequest-driven (immediate)CPU utilization-based (async)
Cold startsYesNo (provisioned at publish-version)
PricingPer-request + durationEC2 instance + 15% management fee
VPCOptionalRequired (specified in Capacity Provider)
Min memory128 MB2 GB

Prerequisites:

  • AWS CLI configured (Lambda, EC2, IAM permissions)
  • Test region: us-east-1

Skip to Verification 1 if you only want the results.

Environment Setup

IAM roles, VPC, and Capacity Provider creation steps

LMI requires two IAM roles: a Lambda execution role and a Capacity Provider operator role that allows Lambda to manage EC2 instances.

Terminal (IAM roles)
# Lambda execution role
cat > lambda-trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "lambda.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF
 
aws iam create-role \
  --role-name LMI-Verification-ExecutionRole \
  --assume-role-policy-document file://lambda-trust-policy.json
 
aws iam attach-role-policy \
  --role-name LMI-Verification-ExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
 
# Capacity Provider operator role
aws iam create-role \
  --role-name LMI-Verification-OperatorRole \
  --assume-role-policy-document file://lambda-trust-policy.json
 
aws iam attach-role-policy \
  --role-name LMI-Verification-OperatorRole \
  --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator

LMI requires a VPC with subnets in at least 3 AZs. A NAT Gateway is needed for CloudWatch Logs egress.

Terminal (VPC resources)
REGION=us-east-1
 
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
  --query 'Vpc.VpcId' --output text --region $REGION)
 
# Private subnets (3 AZs)
PRIV_SUB1=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB2=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.2.0/24 --availability-zone us-east-1b \
  --query 'Subnet.SubnetId' --output text --region $REGION)
PRIV_SUB3=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.3.0/24 --availability-zone us-east-1c \
  --query 'Subnet.SubnetId' --output text --region $REGION)
 
# Public subnet + IGW + NAT Gateway
PUB_SUB=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.100.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text --region $REGION)
 
IGW_ID=$(aws ec2 create-internet-gateway \
  --query 'InternetGateway.InternetGatewayId' --output text --region $REGION)
aws ec2 attach-internet-gateway \
  --internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
 
PUB_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PUB_RT \
  --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID --region $REGION
aws ec2 associate-route-table \
  --route-table-id $PUB_RT --subnet-id $PUB_SUB --region $REGION
 
EIP_ALLOC=$(aws ec2 allocate-address --domain vpc \
  --query 'AllocationId' --output text --region $REGION)
NAT_GW=$(aws ec2 create-nat-gateway --subnet-id $PUB_SUB \
  --allocation-id $EIP_ALLOC \
  --query 'NatGateway.NatGatewayId' --output text --region $REGION)
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT_GW --region $REGION
 
PRIV_RT=$(aws ec2 create-route-table --vpc-id $VPC_ID \
  --query 'RouteTable.RouteTableId' --output text --region $REGION)
aws ec2 create-route --route-table-id $PRIV_RT \
  --destination-cidr-block 0.0.0.0/0 --nat-gateway-id $NAT_GW --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3; do
  aws ec2 associate-route-table \
    --route-table-id $PRIV_RT --subnet-id $SUB --region $REGION
done
 
SG_ID=$(aws ec2 create-security-group --group-name lmi-verification-sg \
  --description "Security group for LMI verification" \
  --vpc-id $VPC_ID --query 'GroupId' --output text --region $REGION)
Terminal (Capacity Provider)
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
 
aws lambda create-capacity-provider \
  --capacity-provider-name lmi-verification-cp \
  --vpc-config "SubnetIds=$PRIV_SUB1,$PRIV_SUB2,$PRIV_SUB3,SecurityGroupIds=$SG_ID" \
  --permissions-config "CapacityProviderOperatorRoleArn=arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-OperatorRole" \
  --instance-requirements "Architectures=x86_64" \
  --capacity-provider-scaling-config "MaxVCpuCount=30" \
  --region $REGION

The verification function supports three modes via the mode parameter: info (basic info), io_bound (I/O simulation), and cpu_bound (CPU load).

lambda_function.py
import json
import os
import time
import threading
import math
 
def lambda_handler(event, context):
    mode = event.get("mode", "info")
    result = {
        "request_id": context.aws_request_id,
        "function_version": context.function_version,
        "pid": os.getpid(),
        "thread_id": threading.current_thread().ident,
        "timestamp": time.time(),
    }
 
    if mode == "io_bound":
        sleep_sec = event.get("sleep_sec", 2)
        time.sleep(sleep_sec)
        result["mode"] = "io_bound"
        result["sleep_sec"] = sleep_sec
    elif mode == "cpu_bound":
        iterations = event.get("iterations", 5_000_000)
        start = time.time()
        total = 0.0
        for i in range(iterations):
            total += math.sqrt(i) * math.sin(i)
        result["mode"] = "cpu_bound"
        result["compute_time_sec"] = round(time.time() - start, 3)
    else:
        result["mode"] = "info"
 
    return {"statusCode": 200, "body": json.dumps(result)}
Terminal (Lambda function + publish-version)
mkdir -p /tmp/lmi-func && cp lambda_function.py /tmp/lmi-func/
cd /tmp/lmi-func && zip -j /tmp/lmi-function.zip lambda_function.py
 
CP_ARN="arn:aws:lambda:${REGION}:${ACCOUNT_ID}:capacity-provider:lmi-verification-cp"
 
aws lambda create-function \
  --function-name lmi-verification-func \
  --runtime python3.13 \
  --role "arn:aws:iam::${ACCOUNT_ID}:role/LMI-Verification-ExecutionRole" \
  --handler lambda_function.lambda_handler \
  --zip-file fileb:///tmp/lmi-function.zip \
  --memory-size 4096 \
  --timeout 60 \
  --capacity-provider-config "{
    \"LambdaManagedInstancesCapacityProviderConfig\": {
      \"CapacityProviderArn\": \"$CP_ARN\",
      \"ExecutionEnvironmentMemoryGiBPerVCpu\": 4.0,
      \"PerExecutionEnvironmentMaxConcurrency\": 10
    }
  }" \
  --region $REGION
 
aws lambda wait function-active-v2 --function-name lmi-verification-func --region $REGION
 
# Publish version (triggers EC2 provisioning)
aws lambda publish-version --function-name lmi-verification-func --region $REGION

Configuration used for this verification:

SettingValue
Regionus-east-1
Architecturex86_64
MaxVCpuCount30
Memory4096 MB
Memory/vCPU ratio4:1 (general purpose)
PerExecutionEnvironmentMaxConcurrency10 (Python default is 16/vCPU; set lower to observe limit behavior)
Instance typeLambda-selected (result: m7i.xlarge)

Verification 1: How Long Does Capacity Provider Startup Take?

Standard Lambda functions are invocable immediately after deploy. LMI provisions EC2 instances at publish-version time. I measured this lead time.

Results

PhaseDuration
Capacity Provider create API → ActiveInstant (seconds)
publish-version API response~1.5s
publish-version → version Active~67 seconds
First invoke latency~1.2s (no cold start)

After publish-version, Lambda launched one m7i.xlarge instance in each of 3 AZs.

Terminal
aws ec2 describe-instances \
  --filters "Name=tag:aws:lambda:capacity-provider,Values=*" \
  --query 'Reservations[*].Instances[*].[InstanceType,Placement.AvailabilityZone]' \
  --output table --region us-east-1
Output
|  m7i.xlarge |  us-east-1c |
|  m7i.xlarge |  us-east-1b |
|  m7i.xlarge |  us-east-1a |

Without specifying instance types, Lambda auto-selected m7i.xlarge (4 vCPU / 16 GB) for the 4GB memory / 4:1 ratio configuration.

First Invoke Confirmation

After the version became Active, invokes showed no cold starts. Five sequential invokes were stable at ~1.2s (including network round-trip). Since the info mode function body executes in near-zero time, most of this latency is the network round-trip from the local environment (Tokyo) to us-east-1.

Output
invoke #1: 1248ms | pid=18
invoke #2: 1288ms | pid=16
invoke #3: 1192ms | pid=22
invoke #4: 1178ms | pid=20
invoke #5: 1204ms | pid=15

Note the different PIDs — five sequential invokes were routed to five different execution environments. With three m7i.xlarge instances (4 vCPU / 16 GB each) and a function configured at 4 GB memory / 1 vCPU, each instance can host multiple execution environments. In practice, more execution environments were running than the MinExecutionEnvironments=3 floor (more on this below).

Takeaway

The 67-second provisioning time needs consideration for CI/CD pipelines. This is fundamentally different from standard Lambda's "deploy and invoke immediately." However, once provisioned, you get stable latency with no cold starts. LMI suits long-running stable functions rather than frequently deployed ones.

Verification 2: How Many Requests Can One Execution Environment Handle?

LMI's key differentiator is multi-concurrency. Standard Lambda uses one execution environment per request; LMI can process multiple requests simultaneously in one environment. I tested with I/O-bound workloads (simulated with 3-second sleep).

Note that the Python runtime implements multi-concurrency using separate processes, not threads (Node.js uses an event loop; Java/.NET use threads). This means you should focus on inter-process coordination and file locking for /tmp access rather than thread safety. Because each execution environment runs as a separate process, it has a unique PID. The following tests use PID distribution to identify which execution environment handled each request.

10 Concurrent Requests

Concurrent invoke commands
Terminal
# Run N concurrent invokes and aggregate PID distribution
N=10
for i in $(seq 1 $N); do
  aws lambda invoke \
    --function-name lmi-verification-func --qualifier 1 \
    --payload '{"mode":"io_bound","sleep_sec":3}' \
    --cli-binary-format raw-in-base64-out \
    --cli-read-timeout 30 \
    /tmp/lmi-result-${i}.json \
    --region us-east-1 > /dev/null 2>&1 &
done
wait
 
# Check PID distribution
for i in $(seq 1 $N); do
  python3 -c "
import json
d=json.load(open('/tmp/lmi-result-${i}.json'))
b=json.loads(d['body'])
print(b['pid'])" 2>/dev/null
done | sort | uniq -c | sort -rn

Change N to 10, 30, or 40 for each test. For CPU-bound tests, change the payload to '{"mode":"cpu_bound","iterations":10000000}'.

Output
All requests completed: 4551ms (3s sleep + network round-trip)
 
PID distribution (same PID = same execution environment):
  3 requests → PID 17
  2 requests → PID 16
  1 request  → PID 24, 21, 19, 18, 15

PID 17 handled 3 requests simultaneously. Multi-concurrency is working. Total completion time of ~4.5s (3s sleep + ~1.5s overhead) confirms all 10 requests were processed in parallel.

30 Concurrent Requests (Near Theoretical Limit)

With MinExecutionEnvironments=3 and PerExecutionEnvironmentMaxConcurrency=10, the minimum concurrent capacity is 3 × 10 = 30.

Output
Completed: 5899ms | Success: 30/30 | Throttled: 0
 
PID distribution:
  3 requests each → PID 15-24 (10 unique PIDs)

All 30 succeeded with zero throttling. 10 unique PIDs were observed. Three m7i.xlarge instances (4 vCPU / 16 GB each) hosted 10 execution environments (4 GB / 1 vCPU each), exceeding the MinExecutionEnvironments=3 floor. Requests were distributed evenly at 3 per environment.

40 Concurrent Requests (Over Limit)

Output
Completed: 9220ms | Success: 34/40 | Throttled: 6
PID distribution: 10 PIDs (15-24), 3-4 requests each

At 40 concurrent requests, 6 were throttled. Completion time stretched to 9.2s. While the environment-level capacity is 10 environments × 10 concurrency = 100, throttling occurred at just 40 concurrent requests. The exact cause is unclear, but routing and queuing overhead may be a contributing factor.

Takeaway

Multi-concurrency works effectively for I/O-bound workloads. A 3-second sleep operation completes in ~4.5s even with 10 concurrent requests. What would require 10 execution environments in standard Lambda can be handled with fewer resources.

From a cost perspective, note that 10 execution environments were running despite MinExecutionEnvironments=3. The actual number of execution environments is not determined by MinExecutionEnvironments alone. Cost estimates should be based on the number and type of EC2 instances launched.

Verification 3: How Fast Does Scale-Out Respond to CPU Load?

LMI scales asynchronously based on CPU utilization. Scaling operates at two layers: adding execution environments on existing instances, and launching additional EC2 instances when instance resources are exhausted. I tested how this differs from standard Lambda's immediate request-driven scaling using CPU-bound workloads.

CPU-Bound Function

A numerical computation function (sqrt + sin × 10 million iterations) taking ~1 second of CPU time per invocation.

50 Concurrent × 3 Batches (15s Intervals)

First, I applied heavy load under the current configuration (MaxExecutionEnvironments not explicitly set) to observe whether execution environments or instances are added automatically.

Output
Batch 1: 18543ms | Success: 33/50 | Throttled: 17 | Unique PIDs: 10
Batch 2: 15212ms | Success: 20/50 | Throttled: 30 | Unique PIDs: 10
Batch 3: 19075ms | Success: 18/50 | Throttled: 32 | Unique PIDs: 10

CPU-bound workloads produced heavy throttling, and no increase in execution environments was observed even with 30-second intervals between batches. PID count remained at 10 throughout. CloudTrail RunInstances events confirmed that no additional EC2 instances were launched during the CPU load test either — the initial 3 instances remained throughout.

After explicitly setting MaxExecutionEnvironments=20 to give Lambda room to add execution environments, PID 25 appeared (11 unique PIDs). Execution environment addition had begun, but was not immediate.

Terminal (scaling config update)
aws lambda put-function-scaling-config \
  --function-name lmi-verification-func --qualifier 1 \
  --function-scaling-config "MinExecutionEnvironments=3,MaxExecutionEnvironments=20" \
  --region us-east-1

30 Concurrent × 5 Batches (Stability Test)

Since 50 concurrent requests caused heavy throttling, I reduced to a level the current 10 execution environments could handle stably, and observed behavior under sustained load:

Output
Batch 1: 18102ms | Success: 29/30 | Throttled: 1
Batch 2: 18838ms | Success: 30/30 | Throttled: 0
Batch 3: 17905ms | Success: 30/30 | Throttled: 0
Batch 4: 15101ms | Success: 30/30 | Throttled: 0
Batch 5: 12235ms | Success: 30/30 | Throttled: 0

10 execution environments handled 30 concurrent requests stably, with processing time decreasing per batch (18s → 12s). The exact cause of the improvement is unclear, but it may be due to amortization of initial costs within each execution environment, such as process initialization and module imports.

CloudWatch Metrics

Output (Throttles)
15:57  51
15:58  186 (peak during 50-concurrent CPU test)
15:59  96
16:00  3
16:01  0

Takeaway

CPU-bound workloads can feel "slow to scale" with LMI. The docs note that throttling may occur if traffic more than doubles within 5 minutes, and this was clearly observable in practice. The gap from standard Lambda's request-driven scaling is significant.

For CPU-bound work, multi-concurrency benefits are limited since invocations share CPU resources within the same environment, which can increase individual request latency. The official blog recommends setting concurrency at or below vCPU count for CPU-intensive workloads.

LMI Suitability Analysis

Performance Comparison

AspectStandard LambdaLMI (Measured)
Deploy → invocableInstant~67 seconds
Cold startsYes (100ms-seconds)None
I/O-bound 10 concurrent10 execution envs needed4.5s completion (multi-concurrency)
CPU-bound 50 concurrent50 envs, immediate scaleThrottling (async scaling)
Scale-out speedPer-request, immediateCPU-based, async (two layers: execution envs + instances)

Cost Structure

LMI uses EC2 instance pricing + 15% management fee. This verification launched m7i.xlarge ($0.2016/hr × 3 instances).

  • LMI minimum cost: 3 × 0.2016×1.150.2016 × 1.15 ≈ **0.70/hour** (minimum 3 EC2 instances always running)
  • Standard Lambda: $0 with no traffic

LMI incurs costs even when idle. High-utilization workloads may benefit from EC2 Savings Plans / Reserved Instances, but low-utilization workloads are far cheaper on standard Lambda.

Summary — LMI Migration Checklist

CheckStandard Lambda FitsLMI Fits
Traffic patternBursty, spikySteady, predictable
Cold start toleranceCan tolerateCannot tolerate
Execution durationShort, event-drivenLong-running, steady-state
Workload typeCPU-bound (scaling challenges)I/O-bound (multi-concurrency shines)
Utilization rateLow (per-request pricing wins)High (EC2 pricing advantage)
VPC requirementsNo VPC neededAlready in VPC
Deploy frequencyFrequent (instant deploy needed)Infrequent (67s wait acceptable)
  • "Managed EC2 cluster" fit check — LMI provides EC2 compute flexibility with Lambda's developer experience, but its scaling model is closer to EC2. If you need instant response to traffic spikes, standard Lambda is better suited
  • Multi-concurrency shines for I/O-bound work — Processing multiple requests in one execution environment dramatically improves resource efficiency for workloads with API calls or DB query wait times. Benefits are limited for CPU-bound work
  • Cost optimization requires utilization analysis — With at least 3 EC2 instances always running, cost efficiency at low traffic is worse than standard Lambda. Consider LMI for high-utilization workloads where Savings Plans apply

Cleanup

Resource deletion commands
Terminal
REGION=us-east-1
 
# Delete Lambda function and versions
aws lambda delete-function --function-name lmi-verification-func --region $REGION
 
# Delete Capacity Provider (auto-terminates EC2 instances)
aws lambda delete-capacity-provider \
  --capacity-provider-name lmi-verification-cp --region $REGION
 
# Delete NAT Gateway (takes time)
aws ec2 delete-nat-gateway --nat-gateway-id $NAT_GW --region $REGION
aws ec2 wait nat-gateway-deleted --nat-gateway-ids $NAT_GW --region $REGION
aws ec2 release-address --allocation-id $EIP_ALLOC --region $REGION
 
# Delete route tables, subnets, IGW, VPC
aws ec2 delete-route-table --route-table-id $PRIV_RT --region $REGION
aws ec2 delete-route-table --route-table-id $PUB_RT --region $REGION
for SUB in $PRIV_SUB1 $PRIV_SUB2 $PRIV_SUB3 $PUB_SUB; do
  aws ec2 delete-subnet --subnet-id $SUB --region $REGION
done
aws ec2 detach-internet-gateway \
  --internet-gateway-id $IGW_ID --vpc-id $VPC_ID --region $REGION
aws ec2 delete-internet-gateway --internet-gateway-id $IGW_ID --region $REGION
aws ec2 delete-security-group --group-id $SG_ID --region $REGION
aws ec2 delete-vpc --vpc-id $VPC_ID --region $REGION
 
# Delete IAM roles
aws iam detach-role-policy --role-name LMI-Verification-ExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam delete-role --role-name LMI-Verification-ExecutionRole
aws iam detach-role-policy --role-name LMI-Verification-OperatorRole \
  --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator
aws iam delete-role --role-name LMI-Verification-OperatorRole

Share this post

Shinya Tahara

Shinya Tahara

Solutions Architect @ AWS

I'm a Solutions Architect at AWS, providing technical guidance primarily to financial industry customers. I share learnings about cloud architecture and AI/ML on this site.The views and opinions expressed on this site are my own and do not represent the official positions of my employer.

Related Posts