HiRL-Scale: Hierarchical RL for Infrastructure Autoscaling

HPA works fine until your services start fighting each other for capacity. This series is about what I designed to replace it — a hierarchical RL autoscaler for a cluster running ~100k pods across ~1,500 services.

It starts with the postmortem that kicked the whole thing off, goes deep on the architecture and the parts that were hard to get right, and ends with an honest look at what I’d do differently.

The Cluster That Learned to Plan Ahead

It started with a postmortem I never wanted to write. A merchant flash sale launched 20 minutes ahead of schedule. Traffic to the payment authorization services doubled in under a minute. Kubernetes HPA did exactly what it was configured to do — it detected the CPU spike and requested scale-out across over 150 checkout-path services simultaneously. Most new pods fit on existing nodes, but dozens of services exhausted their node pool headroom and triggered provisioner requests in a burst. The node provisioner stalled under the queue pressure. New capacity came up four minutes later. ...

Why 1,500 HPAs Is Not an Autoscaling Strategy

Three ways that conventional per-service autoscaling breaks down at 100,000 pods, ~1,550 applications, and five shared domains, and why no amount of HPA tuning makes them go away. Failure Mode 1: The Provisioner Stampede HPA is designed to be autonomous. Each deployment has its own HPA object, its own target utilization, its own scale-out logic. This is great for isolation. Changes to one service’s autoscaling config don’t affect others. It breaks down under coordination pressure. ...

Commander and Soldiers: Decomposing the Scaling Problem

Commander and Soldiers: Decomposing the Scaling Problem Part 3 · Series: Teaching Kubernetes to Think Ahead By Aashish Sheshadri — Platform Architecture The design space for RL-based autoscaling has three obvious options. Two of them do not hold up under scrutiny. Option 1: One Big Agent The most natural first thought: replace all 1,500 HPAs with a single RL agent that observes the entire cluster and outputs replica counts for every service. ...