Long Short-Term Memory networks introduced

In November 1997, Sepp Hochreiter and Jurgen Schmidhuber published “Long Short-Term Memory” in the journal Neural Computation (volume 9, issue 8, pages 1735 to 1780). The paper introduced a new kind of recurrent neural network designed to remember information across long stretches of time.

Ordinary recurrent networks struggled to learn long-range patterns because the training signal would shrink toward zero as it was passed back through many steps, a problem known as the vanishing gradient. The source paper describes how LSTM enforces “constant error flow” through specially designed memory cells, allowing the network to bridge gaps of more than 1000 time steps. The publisher record confirms the title, both authors, the journal, and the 1997 date.

LSTM mattered because it made sequence learning practical. For roughly two decades it became the workhorse behind speech recognition, machine translation, and text prediction, and it remained dominant until the Transformer arrived in 2017. It is one of the clearest examples of an idea that quietly proved itself for years before powering mainstream AI products.

Sources

Last verified June 6, 2026