Albert Gu is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University, where he leads the Goomba Lab, and a co-founder of the audio AI startup Cartesia. His research centers on sequence models and, in particular, on state-space models as an alternative to attention - a thread he has pursued across a series of papers culminating in Mamba.
Gu’s earlier work introduced structured state-space models, notably S4, which showed that the mathematics of continuous-time linear systems could be used to model very long sequences efficiently. The limitation of those early models was that their dynamics were fixed regardless of the input, so they could not selectively focus on or ignore particular tokens the way attention can. The breakthrough came with Mamba, “Linear-Time Sequence Modeling with Selective State Spaces” (arXiv 2312.00752), which he developed with Tri Dao: by making the state-space parameters depend on the input, Mamba gained a selection mechanism while keeping cost that grows only linearly with sequence length. The paper reported up to 5x higher inference throughput than comparable Transformers and strong results across language, audio, and genomics.
Mamba did not displace the Transformer for frontier models, but it kept the search for alternative architectures alive and spawned a wave of state-space and hybrid designs. Gu has been named among Time’s 100 most influential people in AI for this work.
For business readers, Gu represents the most credible bet that the Transformer is not the final word. If sequence modeling ever shifts toward cheaper linear-time architectures for very long contexts, his state-space line of research is where much of that groundwork was laid.