Scalability

Scalability is a system’s ability to absorb growing load without falling over, by adding resources in proportion to the demand. There are two broad ways to scale. Vertical scaling means moving to a bigger machine: more CPU, more memory, faster disks. Horizontal scaling means adding more machines and spreading the work across them. Vertical scaling is simple but hits a hard ceiling, the largest machine money can buy; horizontal scaling has no such ceiling, which is why distributed systems exist.

Horizontal scale is the payoff distributed systems are built to deliver, but it is not free. Once work is spread across many machines, those machines must coordinate, and coordination introduces the consistency and ordering problems that the rest of this part of the library is about. A design that scales the request-handling tier to thousands of nodes can still be bottlenecked by a single shared database or a single coordination service behind it.

Amazon’s Builders’ Library article “Avoiding overload in distributed systems by putting the smaller service in control” illustrates a scaling hazard that only appears at large fleet sizes. Author Joe Magerramov notes that in many systems “the size of the data plane fleet exceeds the size of the control plane fleet, frequently by a factor of 100 or more.” When a vast fleet of clients all call a much smaller backend, an “unexpected shift in the calling pattern of the data plane can increase the load on the control plane to the point of overloading it” and trigger a cascading failure.

The article’s remedies, such as inverting the control flow so the smaller fleet initiates contact, show that scalability is not just about adding hardware. It is about designing the interactions between components so that growth in one part does not overwhelm another. Good scalability means the system’s capacity grows roughly linearly with the resources added, while the coordination overhead is kept from growing faster than the work it coordinates.