Richard Sutton: Father of RL Thinks LLMs Are a Dead End

In this interview on the Dwarkesh Patel podcast, Richard Sutton, a founder of modern reinforcement learning and a 2024 Turing Award recipient, makes a pointed argument against the prevailing belief that scaling up large language models leads toward general intelligence. His view is that learning from experience, by acting in the world and observing the consequences, is fundamental to intelligence in a way that imitating human text is not.

Sutton draws on the philosophy behind his well-known essay “The Bitter Lesson,” which holds that general methods leveraging computation tend to win out over approaches that build in human knowledge. He extends that thinking to argue that today’s language models, trained primarily to predict human-written text, are working from the wrong objective, and that the field will eventually return to agents that learn from their own interaction with an environment.

This is a valuable counterpoint to the dominant narrative around LLMs. Whether or not one agrees, hearing a deeply credentialed researcher articulate why the current paradigm might be a detour sharpens one’s understanding of the open questions in AI. For a reader who wants more than the consensus story, this interview is an unusually substantive challenge to it.

Richard Sutton: Father of RL Thinks LLMs Are a Dead End

Sources

Related