GPGPU (General-Purpose Computing on GPUs)

GPGPU, general-purpose computing on graphics processing units, is the use of a GPU to perform computation that has nothing to do with drawing images. The GPU was built to shade millions of pixels in parallel, but the same wide, throughput-oriented hardware turns out to be well suited to any problem that can be expressed as the same operation applied to many data elements at once. GPGPU is the discipline of mapping such problems onto that hardware.

The idea became practical once GPUs gained programmable shaders. NVIDIA’s GPU Gems 2 devotes an entire section, “General-Purpose Computation on GPUs: A Primer,” to the technique, and its chapter on mapping computational concepts to GPUs explains that “the GPU is actually made up of several programmable processors plus a selection of fixed-function hardware,” and describes how to make the best use of those resources for non-graphics work. The programmable fragment (pixel) processor, designed to compute a color for each pixel, could instead be made to compute an arbitrary numerical result for each element of a data array.

Early GPGPU was a craft of disguise. With no compute API, programmers encoded their input data as textures, expressed their algorithm as a fragment shader, and triggered the computation by rendering a screen-filling rectangle; the resulting “image” was actually the array of computed numbers, read back from the frame buffer. This worked, and it could be dramatically faster than a CPU, but it forced every algorithm to be rewritten in the vocabulary of graphics, with all the awkwardness that implied.

The payoff was performance. Because the GPU executes the same instruction across many data lanes in single-instruction, multiple-data fashion, problems with abundant data parallelism, dense linear algebra, signal processing, simulation, ran far faster than on a general-purpose CPU of the era. GPU Gems 2’s companion chapters quantified GPUs delivering hundreds of gigaflops at a time when high-end CPUs offered roughly an order of magnitude less, which is what made the contortions of shader-based computing worth the effort.

GPGPU’s history is the story of removing that disguise. NVIDIA’s CUDA (2007) and the cross-vendor OpenCL standard gave the GPU a direct compute programming model, so algorithms no longer had to masquerade as rendering. With that barrier gone, GPGPU moved from a specialist trick to mainstream infrastructure, and it became the hardware foundation of the deep-learning era, where training large neural networks is fundamentally a massively parallel numerical workload.