Trajectory Transformer

The Trajectory Transformer was introduced in “Offline Reinforcement Learning as One Big Sequence Modeling Problem,” posted to arXiv on June 3, 2021 by Michael Janner, Qiyang Li, and Sergey Levine at UC Berkeley, and presented as a spotlight at NeurIPS 2021. Appearing within a day of the closely related Decision Transformer, it independently pursued the idea of solving reinforcement learning with the tools of sequence modeling.

The approach uses a Transformer to model the joint distribution over entire trajectories of states, actions, and rewards. Where Decision Transformer conditions on a target return and reads off actions directly, the Trajectory Transformer repurposes beam search, the decoding strategy from machine translation, as a planning algorithm: it searches over high-probability, high-reward continuations of the trajectory. The same trained model could be used for dynamics prediction, imitation learning, and offline RL.

Together with Decision Transformer it helped establish sequence modeling as a serious alternative paradigm for control. For a general reader, it shows how a technique invented for translating sentences can be turned into a way to plan a sequence of actions.

Sources

Related