The Cluster That Learned to Plan Ahead

It started with a postmortem I never wanted to write. A merchant flash sale launched 20 minutes ahead of schedule. Traffic to the payment authorization services doubled in under a minute. Kubernetes HPA did exactly what it was configured to do — it detected the CPU spike and requested scale-out across over 150 checkout-path services simultaneously. Most new pods fit on existing nodes, but dozens of services exhausted their node pool headroom and triggered provisioner requests in a burst. The node provisioner stalled under the queue pressure. New capacity came up four minutes later. ...

March 27, 2026 · 6 min · Aashish Sheshadri

Why 1,500 HPAs Is Not an Autoscaling Strategy

Three ways that conventional per-service autoscaling breaks down at 100,000 pods, ~1,550 applications, and five shared domains, and why no amount of HPA tuning makes them go away. Failure Mode 1: The Provisioner Stampede HPA is designed to be autonomous. Each deployment has its own HPA object, its own target utilization, its own scale-out logic. This is great for isolation. Changes to one service’s autoscaling config don’t affect others. It breaks down under coordination pressure. ...

April 1, 2026 · 9 min · Aashish Sheshadri