A graphics processing unit, or GPU, is a processor designed to accelerate the construction of images intended for display. Its lineage runs from early frame buffer controllers and 2D blitters, through fixed-function 3D rasterizers that drew shaded and textured triangles, to the modern programmable, massively parallel processor that today renders real-time 3D scenes and runs general-purpose computation. The defining shift was the migration of the graphics pipeline, the sequence of stages that turns geometry into pixels, from software running on the CPU into dedicated silicon.
The term itself was introduced by NVIDIA. NVIDIA’s own corporate timeline records that the company, “Founded on April 5, 1993, by Jensen Huang, Chris Malachowsky, and Curtis Priem, with a vision to bring 3D graphics to the gaming and multimedia markets,” went on in 1999 to invent “the GPU, the graphics processing unit, which sets the stage to reshape the computing industry.” Before that coinage the parts were generally called graphics accelerators or 3D accelerators; the GPU label marked the point at which the chip took over the geometry stage (transform and lighting) as well as rasterization.
The standard graphics pipeline that GPUs implement is laid out in the rendering APIs of the era. NVIDIA’s GPU Gems 2 description of the GeForce 6 architecture gives the canonical sequence: command parsing, vertex fetching, vertex processing, primitive grouping, culling and clipping, rasterization, fragment processing, depth testing and blending, and finally output to the frame buffer. Early GPUs hard-wired most of these stages with fixed-function logic. Each stage did one thing, configurable only through a fixed set of state switches.
The decisive evolution was programmability. As Microsoft’s DirectX documentation puts it, “HLSL was created (starting with DirectX 9) to set up the programmable 3D pipeline. You can program the entire pipeline with HLSL instructions.” Programmable vertex and fragment (pixel) shaders replaced the fixed transform-and-lighting and texture-combiner units, letting developers write small programs that ran per vertex and per pixel directly on the GPU.
That programmability turned the GPU into a parallel compute engine. NVIDIA’s GPU Gems 2 chapter on the GeForce 6 Series notes the chip delivered “hundreds of gigaflops of single-precision floating-point computation, as compared to approximately 12 gigaflops for current high-end CPUs” of the time, with the fragment processor working “on groups of hundreds of pixels at a time in single-instruction, multiple-data (SIMD) fashion.” The same wide, throughput-oriented design that rendered pixels in parallel would later be exposed for general-purpose computation, but the GPU’s defining job remained driving the frame buffer that feeds the display.
This entry uses NVIDIA’s stated 1999 GPU-invention date (the GeForce 256 announcement) as the headline date for the concept’s naming.