Efficiently Modeling Long Sequences with Structured State Spaces (S4)

S4, short for Structured State Space sequence model, was introduced by Albert Gu, Karan Goel, and Christopher Re in “Efficiently Modeling Long Sequences with Structured State Spaces,” submitted to arXiv on October 31, 2021. It revived the classical state space model, a tool from control theory, and made it work efficiently as a deep learning layer for sequences thousands of steps long.

A state space model maintains a hidden state that evolves linearly over time and produces outputs from that state. In principle this captures long-range dependencies elegantly, but naive implementations were slow and unstable to train. S4’s contribution was a clever parameterization of the state matrix: conditioning it with a low-rank correction so it can be diagonalized stably, which turns the core computation into an efficient kernel operation. This made the model both tractable and well-behaved.

The results were striking. S4 achieved strong accuracy on sequential image classification, improved on Transformers in several language and image settings while being about 60 times faster at generation, and set state-of-the-art results on the Long Range Arena benchmark. It even solved a 16,000-step pathfinding task that prior methods had failed entirely. The work received an honorable mention at ICLR 2022.

S4 mattered because it offered a serious alternative to attention for long sequences, where the Transformer’s quadratic cost becomes prohibitive. It opened the line of research that led to Mamba and other state space models now used for long documents, audio, and genomics.

Efficiently Modeling Long Sequences with Structured State Spaces (S4)

Sources

Related