Decision Transformer was introduced in “Decision Transformer: Reinforcement Learning via Sequence Modeling,” posted to arXiv on June 2, 2021 by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. It proposes a sharp departure from the usual machinery of reinforcement learning, which relies on value functions and policy gradients.
Instead, it treats a trajectory of states, actions, and rewards as just another sequence to model, the same way a language model predicts the next token. The agent is trained on logged data with a causally masked Transformer, conditioned on a desired total return. At run time you tell the model what return you want and it autoregressively generates actions to achieve it. This sidesteps the instability of bootstrapped value learning entirely. The method was competitive with strong offline RL baselines on Atari and OpenAI Gym tasks.
Decision Transformer connected the rapidly advancing world of Transformer sequence models to control problems. For a general reader, it is a striking unification: the same architecture behind chatbots can also learn to act, simply by viewing decisions as a sequence to be completed.