The Cluster That Learned to Plan Ahead

It started with a postmortem I never wanted to write. A merchant flash sale launched 20 minutes ahead of schedule. Traffic to the payment authorization services doubled in under a minute. Kubernetes HPA did exactly what it was configured to do — it detected the CPU spike and requested scale-out across over 150 checkout-path services simultaneously. Most new pods fit on existing nodes, but dozens of services exhausted their node pool headroom and triggered provisioner requests in a burst. The node provisioner stalled under the queue pressure. New capacity came up four minutes later. ...

March 27, 2026 · 6 min · Aashish Sheshadri

Commander and Soldiers: Decomposing the Scaling Problem

Commander and Soldiers: Decomposing the Scaling Problem Part 3 · Series: Teaching Kubernetes to Think Ahead By Aashish Sheshadri — Platform Architecture The design space for RL-based autoscaling has three obvious options. Two of them do not hold up under scrutiny. Option 1: One Big Agent The most natural first thought: replace all 1,500 HPAs with a single RL agent that observes the entire cluster and outputs replica counts for every service. ...

April 3, 2026 · 11 min · Aashish Sheshadri