Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

“Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting” by Haoyi Zhou and colleagues won a best-paper award at AAAI 2021 after appearing on arXiv in late 2020. It addresses a specific weakness of applying transformers to time series: the standard self-attention mechanism scales quadratically with sequence length, so forecasting far into the future or from long histories becomes prohibitively slow and memory-hungry.

Informer introduces three fixes. A ProbSparse self-attention mechanism focuses computation on the small number of query-key pairs that actually carry strong attention, cutting the cost from quadratic to roughly L log L in the sequence length. A self-attention distilling step progressively shortens the sequence between layers, keeping only dominant features. And a generative-style decoder produces the entire long forecast in a single forward pass instead of predicting one step at a time, which avoids the error accumulation of step-by-step generation. On several large datasets the authors showed Informer significantly outperformed prior methods on long-horizon forecasting.

Informer was an early and influential entry in the line of efficient transformers built specifically for long time series.

Why business readers should care: many planning problems require looking far ahead from long histories, and Informer made transformer-quality forecasts feasible at that range.

Sources

Last verified June 7, 2026