The Cluster That Learned to Plan Ahead

It started with a postmortem I never wanted to write. A merchant flash sale launched 20 minutes ahead of schedule. Traffic to the payment authorization services doubled in under a minute. Kubernetes HPA did exactly what it was configured to do — it detected the CPU spike and requested scale-out across over 150 checkout-path services simultaneously. Most new pods fit on existing nodes, but dozens of services exhausted their node pool headroom and triggered provisioner requests in a burst. The node provisioner stalled under the queue pressure. New capacity came up four minutes later. ...

March 27, 2026 · 6 min · Aashish Sheshadri

Why 1,500 HPAs Is Not an Autoscaling Strategy

Three ways that conventional per-service autoscaling breaks down at 100,000 pods, ~1,550 applications, and five shared domains, and why no amount of HPA tuning makes them go away. Failure Mode 1: The Provisioner Stampede HPA is designed to be autonomous. Each deployment has its own HPA object, its own target utilization, its own scale-out logic. This is great for isolation. Changes to one service’s autoscaling config don’t affect others. It breaks down under coordination pressure. ...

April 1, 2026 · 9 min · Aashish Sheshadri

Commander and Soldiers: Decomposing the Scaling Problem

Commander and Soldiers: Decomposing the Scaling Problem Part 3 · Series: Teaching Kubernetes to Think Ahead By Aashish Sheshadri — Platform Architecture The design space for RL-based autoscaling has three obvious options. Two of them do not hold up under scrutiny. Option 1: One Big Agent The most natural first thought: replace all 1,500 HPAs with a single RL agent that observes the entire cluster and outputs replica counts for every service. ...

April 3, 2026 · 11 min · Aashish Sheshadri