Groq is an American AI hardware company building a custom inference chip it calls the LPU (Language Processing Unit). It was founded in 2016 by Jonathan Ross, who had earlier worked on Google’s Tensor Processing Unit, along with co-founders from Google. The company states it was “established in 2016 for one thing: inference,” meaning running already-trained models rather than training them.
The LPU’s distinguishing design choice is to integrate hundreds of megabytes of SRAM as the primary storage for model weights, rather than relying on off-chip high-bandwidth memory the way GPUs do. Groq pairs this with a purpose-built compiler that produces “static scheduling and deterministic execution,” so the chip’s timing is predictable cycle by cycle and many chips can be linked to “act as a single core.” The company argues this removes the unpredictable delays and wasted operations of general-purpose hardware, yielding fast, low-cost token generation. Groq offers this through a cloud API for serving open models.
Why business readers should care: most attention on AI hardware focuses on training and on Nvidia GPUs, but the cost of running models in production (inference) is where the recurring bill lands. Specialized inference chips like Groq’s LPU are part of a broader competition to drive the cost per token down, which directly shapes the economics of any AI product.