Richard Sutton – Father of RL Thinks LLMs Are a Dead End
[HPP] Ilya SutskeverSeptember 26, 20251h 7min
27 connections·40 entities in this video→LLMs vs. Reinforcement Learning
- 💡 Richard Sutton, a founding father of reinforcement learning and Turing Award winner, believes Large Language Models (LLMs) are a dead end.
- 🎯 Sutton argues that Reinforcement Learning (RL) focuses on understanding the world through experience (action, sensation, reward) to achieve goals, which he considers the essence of intelligence.
- 🧠 In contrast, LLMs primarily mimic people and predict next tokens, lacking a true "world model" or "substantive goal" to learn from real-world consequences or define "right" actions.
The Bitter Lesson and Experiential Learning
- 🔑 Sutton's influential essay, "The Bitter Lesson," posits that scalable methods (computation and experience) will eventually outperform systems relying heavily on human-engineered knowledge.
- 🚀 He suggests LLMs, despite massive computation, rely too much on human-curated training data, making them less scalable than future systems learning purely from continuous real-world experience.
- 🌱 The experiential paradigm involves continuous learning from sensation, action, and reward, where knowledge is about predicting consequences and increasing rewards, allowing constant testing and refinement.
Human vs. Animal Learning
- 🔬 Sutton contends that imitation learning is not a basic animal learning process; instead, trial-and-error and prediction learning are fundamental.
- 🐿️ He highlights that supervised learning, like human schooling, is an exception and does not broadly occur in nature, citing squirrels as an example of learning without formal training.
- 💬 Sutton emphasizes understanding animal learning processes as key to general intelligence, rather than focusing solely on human-specific traits.
Components of a Continual Learning Agent
- 🛠️ A general continual learning agent would comprise four key parts: a policy (what to do), a value function (predicting long-term outcomes and enabling intermediate rewards), a perception component (state representation), and a transition model of the world (predicting consequences of actions).
- 📈 This agent would learn richly from all sensations, not just rewards, with rewards being a crucial but small part of the overall model.
- 🧠 The knowledge gained would be about the stream of experience, allowing for continuous testing and learning.
Generalization and AI Surprises
- ⚠️ Sutton notes that generalization remains a challenge in AI, with current deep learning methods often generalizing poorly and requiring human intervention.
- ✨ He finds it surprising how effective artificial neural networks are at language tasks and gratifying that "weak methods" (general principles like search and learning) have largely triumphed over "strong methods" (human-imbued knowledge).
- 🎯 The success of systems like AlphaGo and AlphaZero exemplifies the power of simple basic principles and learning from experience.
AI Succession and Future Perspectives
- 🌍 Sutton presents a four-part argument for the inevitable succession to AI or augmented humans, driven by lack of unified human consensus, eventual understanding of intelligence, superintelligence, and resource accumulation by intelligent entities.
- 🤖 He views this as a major transition in the universe from replication (like biological life) to design (where AIs design other AIs), shifting from making copies without full understanding to creating intelligence with known mechanisms.
- 🛡️ Concerns about this future include "corruption" from external knowledge (e.g., "viruses" or hidden goals) when integrating diverse AI experiences, suggesting a need for cybersecurity in digital intelligences.
- ✅ Sutton suggests that, like raising children, the focus should be on instilling robust, steerable, and prosocial values in AI, rather than dictating specific outcomes for the future.
Knowledge graph40 entities · 27 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover · drag to explore
40 entities
Chapters12 moments
Key Moments
Transcript210 segments
Full Transcript
Topics15 themes
What’s Discussed
Reinforcement LearningLarge Language Models (LLMs)The Bitter LessonContinual LearningExperiential LearningImitation LearningWorld ModelsGoal-Oriented AIGeneralizationDeep LearningArtificial Neural NetworksAlphaZeroTemporal Difference LearningAI SuccessionSuperintelligence
Smart Objects40 · 27 links
People· 6
Products· 4
Medias· 4
Concepts· 23
Company· 1
Events· 2