@shinyaz

Verifying EKS Managed Node Group Warm Pools — Scale-Out Times by poolState and the reuseOnScaleIn Reality

Table of Contents

Introduction

On April 8, 2026, AWS announced warm pool support for EKS managed node groups. Warm pools maintain pre-initialized EC2 instances—with OS initialization, user data execution, and software configuration already complete—that can join the cluster without a full cold-start sequence during scale-out events.

The effectiveness of warm pools varies significantly based on poolState selection. Running offers the fastest scale-out but incurs continuous instance charges, while Stopped is cost-efficient but slower to transition. This article quantifies the "speed vs. cost" tradeoff with real measurements and provides a decision framework for choosing the right poolState based on workload characteristics.

We measured scale-out times across three patterns and present the data needed to decide whether warm pools are worth adopting and which poolState to choose. For official documentation, see Warm pools with managed node groups.

How Warm Pools Work

When you enable a warm pool, EKS creates an EC2 Auto Scaling warm pool attached to the node group's Auto Scaling group. Instances wait in the warm pool in one of three states:

poolStateStateMaintenance Cost (t3.medium/mo)Scale-Out Behavior
RunningRunning~$32 (instance + EBS)Joins cluster directly
StoppedStopped~$2 (EBS only)Starts → joins cluster
HibernatedHibernated~$2 (EBS + RAM)Restores RAM → joins cluster

Cost estimates based on t3.medium (gp3 20GB) pricing in ap-northeast-1. Varies by instance type and disk size.

Prerequisites:

  • AWS CLI v2.34+ configured (eks:*, ec2:*, iam:*, autoscaling:* permissions)
  • kubectl v1.35+
  • Test region: ap-northeast-1 (Tokyo)

Skip to Verification 1 if you only want the results.

Environment Setup

EKS cluster, node role, and node group creation steps

We used an existing EKS cluster (v1.35). For Auto Mode clusters, managed node groups require vpc-cni, kube-proxy, and coredns addons.

Terminal (Install addons)
REGION=ap-northeast-1
CLUSTER=eks-sandbox
 
aws eks create-addon --cluster-name $CLUSTER --addon-name vpc-cni --region $REGION
aws eks create-addon --cluster-name $CLUSTER --addon-name kube-proxy --region $REGION
aws eks create-addon --cluster-name $CLUSTER --addon-name coredns --region $REGION

Create a node role with the required policies.

Terminal (Create node role)
cat > node-trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "ec2.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF
 
aws iam create-role \
  --role-name eks-warmpool-verify-node-role \
  --assume-role-policy-document file://node-trust-policy.json
 
aws iam attach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
aws iam attach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

Create three node groups: cold start (no warm pool), Stopped warm pool, and Running warm pool.

Terminal (Prepare variables)
# Get account ID
ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
NODE_ROLE="arn:aws:iam::${ACCOUNT_ID}:role/eks-warmpool-verify-node-role"
 
# Get cluster's private subnets
SUBNETS=$(aws eks describe-cluster --name $CLUSTER --region $REGION \
  --query 'cluster.resourcesVpcConfig.subnetIds' --output text)
Terminal (Create node groups)
 
# 1. Cold start (no warm pool)
aws eks create-nodegroup \
  --cluster-name $CLUSTER --nodegroup-name ng-cold-start \
  --node-role $NODE_ROLE --subnets $SUBNETS \
  --instance-types t3.medium \
  --scaling-config minSize=2,maxSize=5,desiredSize=2 \
  --region $REGION
 
# 2. Stopped warm pool
aws eks create-nodegroup \
  --cluster-name $CLUSTER --nodegroup-name ng-warm-stopped \
  --node-role $NODE_ROLE --subnets $SUBNETS \
  --instance-types t3.medium \
  --scaling-config minSize=2,maxSize=5,desiredSize=2 \
  --warm-pool-config enabled=true,maxGroupPreparedCapacity=4,minSize=1,poolState=Stopped,reuseOnScaleIn=true \
  --region $REGION
 
# 3. Running warm pool
aws eks create-nodegroup \
  --cluster-name $CLUSTER --nodegroup-name ng-warm-running \
  --node-role $NODE_ROLE --subnets $SUBNETS \
  --instance-types t3.medium \
  --scaling-config minSize=2,maxSize=5,desiredSize=2 \
  --warm-pool-config enabled=true,maxGroupPreparedCapacity=4,minSize=1,poolState=Running,reuseOnScaleIn=false \
  --region $REGION

Warm pool configuration parameters:

  • enabled=true — Enable warm pool
  • maxGroupPreparedCapacity=4 — Maximum total instances across warm pool and ASG. With desiredSize=2, the warm pool gets 4-2=2 instances
  • minSize=1 — Minimum instances in the warm pool
  • poolState=Stopped|Running — Instance state in the warm pool
  • reuseOnScaleIn=true — Return instances to warm pool on scale-in

After node groups become ACTIVE, wait for warm pool instances to reach the target state (Stopped/Running).

Terminal (Check warm pool status)
ASG=$(aws eks describe-nodegroup --cluster-name $CLUSTER --nodegroup-name ng-warm-stopped \
  --region $REGION --query 'nodegroup.resources.autoScalingGroups[0].name' --output text)
 
aws autoscaling describe-warm-pool --auto-scaling-group-name $ASG --region $REGION \
  --query 'Instances[].{InstanceId:InstanceId,State:LifecycleState}' --output table
Output
-------------------------------------------
|            DescribeWarmPool             |
+----------------------+------------------+
|      InstanceId      |      State       |
+----------------------+------------------+
|  i-0a546e64a73986140 |  Warmed:Stopped  |
|  i-0c0c0f4ad0652338a |  Warmed:Stopped  |
+----------------------+------------------+

Verification 1: Scale-Out Time by poolState

We measured scale-out time across three patterns. For each, we changed desiredSize from 2 to 3 and recorded the time until the new node showed Ready in kubectl get nodes.

We used the following polling script for measurement. Replace NODEGROUP with each node group name.

Measurement polling script
Terminal (Polling script)
NODEGROUP="ng-cold-start"  # Replace with ng-warm-stopped / ng-warm-running
START_TIME=$(date +%s)
 
while true; do
  READY=$(kubectl get nodes --selector="eks.amazonaws.com/nodegroup=$NODEGROUP" \
    --no-headers 2>/dev/null | grep " Ready " | wc -l)
  TOTAL=$(kubectl get nodes --selector="eks.amazonaws.com/nodegroup=$NODEGROUP" \
    --no-headers 2>/dev/null | wc -l)
  ELAPSED=$(( $(date +%s) - START_TIME ))
  echo "$(date +%T) [${ELAPSED}s] Nodes: $TOTAL total, $READY ready"
  [ "$READY" -ge 3 ] && echo "=== New node Ready! ===" && break
  sleep 5
done

(a) Cold Start (No Warm Pool)

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-cold-start \
  --scaling-config minSize=2,maxSize=5,desiredSize=3 \
  --region ap-northeast-1
Output (polled every 5s)
17:30:27 [3s]  Nodes: 2 total, 2 ready
17:30:54 [30s] Nodes: 2 total, 2 ready
17:31:06 [42s] Nodes: 3 total, 2 ready  ← node appeared
17:31:26 [62s] Nodes: 3 total, 3 ready  ← Ready

Result: 62 seconds. Looking at the log breakdown, the node appeared at 42s (EC2 instance launch complete + kubelet startup began), then took 20 more seconds to reach Ready (CNI configuration, node registration complete).

(b) Stopped Warm Pool

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --scaling-config minSize=2,maxSize=5,desiredSize=3 \
  --region ap-northeast-1
Output
17:31:39 [2s]  Nodes: 2 total, 2 ready
17:32:05 [28s] Nodes: 3 total, 2 ready  ← node appeared
17:32:25 [48s] Nodes: 3 total, 3 ready  ← Ready

Result: 48 seconds. 14 seconds (23%) faster than cold start. Since warm pool instances are stopped with OS initialization already complete, the full boot sequence is skipped, which likely accounts for the reduction.

(c) Running Warm Pool

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-running \
  --scaling-config minSize=2,maxSize=5,desiredSize=3 \
  --region ap-northeast-1
Output
17:32:38 [2s]  Nodes: 2 total, 2 ready
17:32:51 [15s] Nodes: 3 total, 2 ready  ← node appeared
17:33:04 [28s] Nodes: 3 total, 3 ready  ← Ready

Result: 28 seconds. 34 seconds (55%) faster than cold start. Since the Running instance is already up, EC2 launch and OS initialization steps can be skipped. The remaining 28 seconds is likely spent primarily on EKS bootstrap (cluster join + CNI setup).

Verification 2: reuseOnScaleIn Behavior and Re-Scale-Out Speed

Verification 1 established the Stopped warm pool baseline (48s). Verification 2 tests the scale-in → re-scale-out cycle with reuseOnScaleIn=true to see if reused instances are faster.

After Verification 1, ng-warm-stopped has desiredSize=3. We scale in from 3 to 2.

Scale-In: Instance Return to Warm Pool

We changed desiredSize from 3 to 2 to trigger scale-in.

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --scaling-config minSize=2,maxSize=5,desiredSize=2 \
  --region ap-northeast-1

The scale-in process progressed through these stages. You can observe the warm pool state transitions with the following command:

Terminal (Observe warm pool state)
# Get ASG name (skip if already obtained during setup)
ASG=$(aws eks describe-nodegroup --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --region ap-northeast-1 --query 'nodegroup.resources.autoScalingGroups[0].name' --output text)
 
# Periodically check node and warm pool status
while true; do
  echo "--- $(date +%T) ---"
  kubectl get nodes --selector='eks.amazonaws.com/nodegroup=ng-warm-stopped' --no-headers
  aws autoscaling describe-warm-pool --auto-scaling-group-name $ASG --region ap-northeast-1 \
    --query 'Instances[].{Id:InstanceId,State:LifecycleState}' --output table
  sleep 15
done
Output (scale-in progression)
17:33:43 Node: Ready,SchedulingDisabled  ← drain started
17:34:35 Node: NotReady,SchedulingDisabled  ← kubelet stopped
17:49:09 Warm pool: Warmed:Pending:Proceed  ← lifecycle hook completed
17:49:40 Warm pool: Warmed:Stopped  ← return complete

Scale-in to return took approximately 16 minutes. EKS automatically configures a Terminate-LC-Hook on the ASG with a heartbeat timeout of 1800 seconds (30 minutes). Although the actual node drain completed in about 1 minute, the lifecycle hook processing caused additional wait time. The documentation states to always configure warm pools through the EKS API and not modify warm pool settings directly via the EC2 Auto Scaling API, so adjusting this timeout value is not recommended.

Output (lifecycle hooks)
{
  "LifecycleHooks": [
    {
      "LifecycleHookName": "Launch-LC-Hook",
      "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
      "HeartbeatTimeout": 1800
    },
    {
      "LifecycleHookName": "Terminate-LC-Hook",
      "LifecycleTransition": "autoscaling:EC2_INSTANCE_TERMINATING",
      "HeartbeatTimeout": 1800
    }
  ]
}

Re-Scale-Out: Reused Instance Speed

We scaled back out using the instance that was returned to the warm pool.

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --scaling-config minSize=2,maxSize=5,desiredSize=3 \
  --region ap-northeast-1
Output
17:49:55 [2s]  Nodes: 2 total, 2 ready
17:50:22 [29s] Nodes: 2 total, 2 ready
17:50:28 [35s] Nodes: 3 total, 3 ready  ← Ready

Result: 35 seconds. 13 seconds faster than the initial Stopped warm pool (48s). On the first launch, kubelet must perform TLS bootstrapping (certificate issuance) and register a new node object with the cluster. Reused instances have these cached, which likely accounts for the faster cluster join.

Verification 3: Scale-Out with Pending Pods — Time to Pod Running

Verifications 1-2 used manual desiredSize changes. Verification 3 simulates a production scenario with pending pods to measure the full cycle from scale-out to pod Running.

After Verification 2's re-scale-out, ng-warm-stopped has desiredSize=3. First, scale back to 2.

Terminal (Reset desiredSize to 2)
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --scaling-config minSize=2,maxSize=5,desiredSize=2 \
  --region ap-northeast-1
# Wait for warm pool instances to return to Warmed:Stopped (~16 min)

Start the verification once desiredSize is back to 2 with 2 Stopped instances in the warm pool.

We filled the existing 2 nodes with resource-heavy pods and created a third pod that went Pending.

Pod manifest (filler-pod × 2 + pending-pod × 1)
Terminal (Create pods)
kubectl apply -f - << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: filler-pod-1
spec:
  nodeSelector:
    eks.amazonaws.com/nodegroup: ng-warm-stopped
  containers:
    - name: nginx
      image: nginx:alpine
      resources:
        requests:
          cpu: "1500m"
          memory: "2Gi"
---
apiVersion: v1
kind: Pod
metadata:
  name: filler-pod-2
spec:
  nodeSelector:
    eks.amazonaws.com/nodegroup: ng-warm-stopped
  containers:
    - name: nginx
      image: nginx:alpine
      resources:
        requests:
          cpu: "1500m"
          memory: "2Gi"
---
apiVersion: v1
kind: Pod
metadata:
  name: pending-pod
spec:
  nodeSelector:
    eks.amazonaws.com/nodegroup: ng-warm-stopped
  containers:
    - name: nginx
      image: nginx:alpine
      resources:
        requests:
          cpu: "1500m"
          memory: "2Gi"
EOF

Since t3.medium has 2 vCPU / 4 GiB with approximately 1930m allocatable CPU, and each node runs system pods, a pod requesting 1500m CPU can only fit one per node. As a result, only filler-pod-1 reached Running, while filler-pod-2 and pending-pod went Pending.

Output (Pod status)
NAME           READY   STATUS    AGE
filler-pod-1   1/1     Running   6s
filler-pod-2   0/1     Pending   6s
pending-pod    0/1     Pending   6s

With 2 pods Pending, we needed 2 additional nodes. We scaled out from the Stopped warm pool by changing desiredSize from 2 to 4.

Terminal
aws eks update-nodegroup-config \
  --cluster-name eks-sandbox --nodegroup-name ng-warm-stopped \
  --scaling-config minSize=2,maxSize=5,desiredSize=4 \
  --region ap-northeast-1
Output
17:53:29 [2s]  Running: 1/3
17:53:55 [28s] filler-pod-2: ContainerCreating  ← node Ready, pod scheduled
17:54:21 [54s] Running: 2/3
17:54:34 [67s] Running: 3/3  ← all pods Running

Result: 67 seconds (all pods Running). Breakdown: ~48s for node Ready + ~19s for pod scheduling and image pull. With Cluster Autoscaler, additional overhead for pending pod detection and ASG desired size update would apply (not measured in this article).

Overall Comparison: poolState × Scaling Method

PatternTime to Node ReadyMaintenance Cost (1 instance/mo)Notes
Cold start (no warm pool)62s$0Baseline
Stopped warm pool (initial)48s (-23%)~$2 (EBS only)OS restart only
Running warm pool (initial)28s (-55%)~$32 (instance + EBS)Bootstrap only
reuseOnScaleIn reuse (Stopped)35s (-44%)~$2 (EBS only)Faster reuse (likely bootstrap cache)
Stopped + Pod Running67s~$2 (EBS only)Includes image pull

Decision Framework for poolState Selection

Based on the measurements:

  • Choose Running when scale-out speed is the top priority and you need nodes within 30 seconds. Keep warm pool size minimal since instance charges continue. Best for workloads with frequent bursts where per-event latency cost is high
  • Choose Stopped when cost efficiency matters and ~50 seconds of scale-out time is acceptable. At roughly 1/16 the cost of Running (EBS charges only), this suits low-to-medium burst frequency workloads
  • Enable reuseOnScaleIn when scale-in/out cycles are frequent. Reused instances are 13 seconds faster than initial Stopped. However, the ~16 minute drain wait for scale-in to warm pool return means this is less effective for rapid consecutive bursts

When NOT to Use Warm Pools

  • For instance types with short boot times like t3.medium, the gap between cold start and warm pool is only 14-34 seconds. The benefit increases with heavy user data initialization (large package installs, data downloads)
  • Warm pools don't support custom AMIs (EKS optimized AMIs only). If you use custom AMIs, warm pools aren't available. Initialization must be done via user data (launch template)
  • Bottlerocket AMIs don't support Hibernated state or reuseOnScaleIn

Summary

  • Running is fast but expensive — 28 seconds to add a node, but instance charges continue. Reserve for workloads with strict latency requirements
  • Stopped offers the best balance — 48 seconds with EBS-only charges. Sufficient for most workloads
  • reuseOnScaleIn has a 16-minute drain bottleneck — Re-scale-out is 35 seconds (fast), but returning to the warm pool took ~16 minutes. EKS's auto-configured lifecycle hook (1800s timeout) is involved, and the documentation does not recommend modifying this value directly
  • Pod Running time = Node Ready + image pull — Even after a node is Ready, pods take an additional ~19 seconds to reach Running. Consider image pre-pulling and image size optimization alongside warm pool configuration
Cleanup steps
Terminal (Delete resources)
REGION=ap-northeast-1
CLUSTER=eks-sandbox
 
# Delete pods
kubectl delete pod filler-pod-1 filler-pod-2 pending-pod
 
# Delete node groups
for NG in ng-cold-start ng-warm-stopped ng-warm-running; do
  aws eks delete-nodegroup --cluster-name $CLUSTER --nodegroup-name $NG --region $REGION
done
 
# Wait for deletion
for NG in ng-cold-start ng-warm-stopped ng-warm-running; do
  aws eks wait nodegroup-deleted --cluster-name $CLUSTER --nodegroup-name $NG --region $REGION
done
 
# Delete addons (if added for Auto Mode cluster)
aws eks delete-addon --cluster-name $CLUSTER --addon-name vpc-cni --region $REGION
aws eks delete-addon --cluster-name $CLUSTER --addon-name kube-proxy --region $REGION
aws eks delete-addon --cluster-name $CLUSTER --addon-name coredns --region $REGION
 
# Delete IAM role
aws iam detach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam detach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
aws iam detach-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
aws iam delete-role-policy --role-name eks-warmpool-verify-node-role \
  --policy-name ClusterAutoscalerPolicy 2>/dev/null  # Only if added
aws iam delete-role --role-name eks-warmpool-verify-node-role

Share this post

Shinya Tahara

Shinya Tahara

Solutions Architect @ AWS

I'm a Solutions Architect at AWS, providing technical guidance primarily to financial industry customers. I share learnings about cloud architecture and AI/ML on this site.The views and opinions expressed on this site are my own and do not represent the official positions of my employer.

Related Posts