Locality of Reference

Locality of reference is the observation that programs do not access memory at random but instead concentrate their accesses in predictable ways. It comes in two forms. Temporal locality is the tendency to reuse the same data or instructions again soon after using them - a loop counter, a frequently called function, a hot data structure. Spatial locality is the tendency to access memory locations near ones just accessed - stepping through an array, executing instructions in sequence. Both are empirical regularities of real software rather than logical necessities, but they hold so reliably that nearly all of computer architecture is built to exploit them.

These two properties are the entire reason caches work. A cache is small, so it can only hold a fraction of memory at once; it pays off only if the data it keeps is likely to be reused (temporal locality) and if loading a whole cache line brings in data that will soon be wanted (spatial locality). The same logic justifies the TLB, virtual memory’s page replacement, prefetching, and even how databases lay out records on disk. Strip away locality and the entire memory hierarchy would collapse to the speed of its slowest layer.

The idea was placed on a rigorous footing by Peter Denning in his 1968 paper “The Working Set Model for Program Behavior,” published in Communications of the ACM. Denning defined a program’s working set as the collection of its most recently used pages over a sliding window of time. He argued that this set provides “knowledge vital to the dynamic management of paged memories,” because a program that is given enough fast memory to hold its working set will run efficiently, while one starved of it will thrash - spending most of its time shuttling pages in and out rather than computing.

The working-set model gave operating systems a principled way to decide how much memory each program needed and which pages to keep resident. Rather than guessing, the system could measure recent behavior and allocate physical memory to match each program’s working set, balancing the demands of many processes sharing one machine. This was a direct, quantitative expression of temporal locality applied to virtual memory.

Hennessy and Patterson’s “Computer Architecture: A Quantitative Approach” treats the principle of locality as a foundational assumption running through the whole book, since the cost-effectiveness of the memory hierarchy depends on it. Designers measure programs’ locality and tune cache sizes, line lengths, and replacement policies to match, which is why the textbook’s quantitative style is possible at all.

Locality is also why the way software is written affects how fast it runs. Two programs that compute the same result can differ wildly in speed depending on whether they touch memory in cache-friendly, locality-rich patterns or scatter their accesses across memory. Understanding locality is therefore not just hardware lore but a practical tool for anyone trying to make code fast.