@shinyaz

Cognito Auth UI and HPA to Complete the Agent Platform — Agentic AI on EKS Part 3

Table of Contents

Introduction

Part 1 covered Weather Agent + MCP Server, and Part 2 validated Travel Agent's A2A coordination.

This final post fills in the remaining pieces: a Cognito OAuth-authenticated Web UI and HPA autoscaling. With these in place, all four workshop components are deployed and the agent platform is fully operational.

Architecture Overview

With the Agent UI and HPA added in this post, all four workshop components are in place.

Agent UI Architecture

The Agent UI combines Gradio (a Python-based web UI framework) with FastAPI. Authentication uses Cognito OAuth2's Authorization Code flow.

A notable design feature is agent mode switching. Users select between "Single Agent (Weather)" and "Multi-Agent (Travel)" via radio buttons, connecting to different agents from the same chat interface.

# app.py - Agent selection logic
if agent_mode == "Single Agent(Weather)":
    endpoint_url = "http://weather-agent.agents/prompt"
else:  # Multi-Agent(Travel)
    endpoint_url = "http://travel-agent.agents/prompt"

Requests from the UI include Cognito JWT tokens in the Authorization header. Agents can run in test mode with DISABLE_AUTH=1, but production deployments validate JWT tokens.

Deployment Steps

UI deployment completes in three steps.

1. Set Cognito user passwords

Set passwords for Alice/Bob users created by Terraform.

aws cognito-idp admin-set-user-password \
  --user-pool-id $COGNITO_POOL_ID \
  --username Alice --password "Passw0rd@" --permanent

2. Create OAuth secret

Register Cognito Client ID/Secret as a Kubernetes Secret. The UI Pod loads it via envFrom.

kubectl create secret generic agent-ui \
  --namespace ui \
  --from-env-file ui/.env

3. Helm deploy

helm upgrade agent-ui manifests/helm/ui \
  --install -n ui --create-namespace \
  -f workshop-ui-values.yaml

After deployment, kubectl port-forward svc/agent-ui -n ui 8000:80 makes the UI available at http://localhost:8000. Users are redirected to Cognito login, and after authentication, the Gradio chat interface appears.

HPA Autoscaling

The Helm chart includes an HPA template, enabled with autoscaling.enabled=true.

helm upgrade weather-agent manifests/helm/agent \
  --namespace agents \
  -f workshop-agent-weather-values.yaml \
  --set autoscaling.enabled=true \
  --set autoscaling.minReplicas=1 \
  --set autoscaling.maxReplicas=3 \
  --set autoscaling.targetCPUUtilizationPercentage=50 \
  --set resources.requests.cpu=100m \
  --set resources.requests.memory=256Mi

HPA is working correctly:

NAME            TARGETS       MINPODS   MAXPODS   REPLICAS
weather-agent   cpu: 3%/50%   1         3         1
travel-agent    cpu: 1%/50%   1         3         1

At idle, both run with 1 replica. When CPU exceeds 50%, they scale up to 3 replicas. Since EKS Auto Mode handles node provisioning automatically, configuring Pod-level HPA is all it takes for end-to-end cluster scaling.

Resource Consumption Across All Components

Measured values for all four components at idle:

ComponentCPUMemoryRole
Weather Agent3m405MiLLM calls + MCP tools
Travel Agent1m143MiA2A orchestration
Weather MCP Server1m56MiNWS API wrapper
Agent UI3m119MiGradio + OAuth
Total8m723Mi

The idle footprint is lightweight at 8m CPU / 723Mi memory total. However, Weather Agent CPU spikes during LLM calls, making CPU-based HPA thresholds important. The MCP Server is the lightest component as a pure API proxy.

Takeaways

  • Manage OAuth secrets via Kubernetes Secret + envFrom — Keep Cognito Client Secrets out of Helm values by separating them into Secrets. The distinction between ConfigMaps (public config) and Secrets (credentials) is key to agent platform operations.
  • HPA + EKS Auto Mode for complete scaling — Pod-level HPA is all you need; Auto Mode handles node provisioning. Agents have bursty load characteristics from LLM calls, making CPU-based HPA a natural fit.
  • 723Mi total for 4 components — The idle footprint is light. In production, session management (S3) and model invocation (Bedrock) costs dominate rather than compute.

Looking back across the series, the key takeaway is that production AI agents introduce three design axes absent from traditional microservices: protocol design (MCP / A2A), configuration externalization (ConfigMap / Secret), and session state management. The workshop covers all three through its four-component architecture — a well-crafted learning experience.


This is Part 3 (final) of the Agentic AI on EKS workshop validation series.

Share this post

Shinya Tahara

Shinya Tahara

Solutions Architect @ AWS

I'm a Solutions Architect at AWS, providing technical guidance primarily to financial industry customers. I share learnings about cloud architecture and AI/ML on this site.The views and opinions expressed on this site are my own and do not represent the official positions of my employer.

Related Posts