Translation Lookaside Buffer

A translation lookaside buffer, or TLB, is a small hardware cache inside the processor that stores recently used translations from virtual addresses to physical addresses. Virtual memory gives each program its own address space, but every memory access must first translate the program’s virtual address into the real physical location in RAM. That translation is defined by page tables held in main memory, and walking those tables on every single access would be ruinously slow. The TLB caches the results so that the common case avoids the walk entirely.

The TLB is essentially a cache specialized for one job: address translation. When the processor issues a virtual address, it checks the TLB first. On a TLB hit, the physical address is available immediately and the access proceeds. On a TLB miss, the hardware or operating system must walk the page tables in memory to find the translation, then install it in the TLB so that subsequent accesses to the same page are fast. As the OSTEP operating-systems text puts it, the TLB is “part of the chip’s memory-management unit (MMU), and is simply a hardware cache of popular virtual-to-physical address translations.”

The TLB works for the same reason every cache works: locality of reference. A program tends to touch the same pages repeatedly over short spans of time, so a small TLB holding a few dozen to a few thousand entries captures the large majority of translations. Because pages are typically four kilobytes or larger, a single TLB entry covers a wide range of addresses, multiplying the buffer’s reach.

Like data caches, TLBs are built in multiple levels on modern processors - a tiny, very fast first-level TLB backed by a larger, slightly slower second-level TLB - and they are often split between instruction and data accesses. Hennessy and Patterson’s “Computer Architecture: A Quantitative Approach” covers the TLB as an integral part of memory-hierarchy design, since it sits on the critical path of nearly every memory reference.

TLB management has real costs. When the operating system changes a page mapping, it must invalidate the now-stale TLB entries, and on a multiprocessor it may have to coordinate that invalidation across cores - the so-called TLB shootdown. Modern designs add tags such as process-context identifiers so that entries from different address spaces can coexist in the TLB without being flushed on every context switch, preserving the buffer’s value across program boundaries.

The TLB is one of the quiet pieces of hardware that makes the convenient abstraction of virtual memory affordable. Without it, the indirection that isolates and protects programs from one another would impose a heavy tax on every instruction; with it, that tax nearly vanishes for code with good locality.