Skip to main content

Richard Sutton – Father of RL Thinks LLMs Are a Dead End

[HPP] Ilya SutskeverSeptember 26, 20251h 7min
27 connections·40 entities in this video

LLMs vs. Reinforcement Learning

  • 💡 Richard Sutton, a founding father of reinforcement learning and Turing Award winner, believes Large Language Models (LLMs) are a dead end.
  • 🎯 Sutton argues that Reinforcement Learning (RL) focuses on understanding the world through experience (action, sensation, reward) to achieve goals, which he considers the essence of intelligence.
  • 🧠 In contrast, LLMs primarily mimic people and predict next tokens, lacking a true "world model" or "substantive goal" to learn from real-world consequences or define "right" actions.

The Bitter Lesson and Experiential Learning

  • 🔑 Sutton's influential essay, "The Bitter Lesson," posits that scalable methods (computation and experience) will eventually outperform systems relying heavily on human-engineered knowledge.
  • 🚀 He suggests LLMs, despite massive computation, rely too much on human-curated training data, making them less scalable than future systems learning purely from continuous real-world experience.
  • 🌱 The experiential paradigm involves continuous learning from sensation, action, and reward, where knowledge is about predicting consequences and increasing rewards, allowing constant testing and refinement.

Human vs. Animal Learning

  • 🔬 Sutton contends that imitation learning is not a basic animal learning process; instead, trial-and-error and prediction learning are fundamental.
  • 🐿️ He highlights that supervised learning, like human schooling, is an exception and does not broadly occur in nature, citing squirrels as an example of learning without formal training.
  • 💬 Sutton emphasizes understanding animal learning processes as key to general intelligence, rather than focusing solely on human-specific traits.

Components of a Continual Learning Agent

  • 🛠️ A general continual learning agent would comprise four key parts: a policy (what to do), a value function (predicting long-term outcomes and enabling intermediate rewards), a perception component (state representation), and a transition model of the world (predicting consequences of actions).
  • 📈 This agent would learn richly from all sensations, not just rewards, with rewards being a crucial but small part of the overall model.
  • 🧠 The knowledge gained would be about the stream of experience, allowing for continuous testing and learning.

Generalization and AI Surprises

  • ⚠️ Sutton notes that generalization remains a challenge in AI, with current deep learning methods often generalizing poorly and requiring human intervention.
  • ✨ He finds it surprising how effective artificial neural networks are at language tasks and gratifying that "weak methods" (general principles like search and learning) have largely triumphed over "strong methods" (human-imbued knowledge).
  • 🎯 The success of systems like AlphaGo and AlphaZero exemplifies the power of simple basic principles and learning from experience.

AI Succession and Future Perspectives

  • 🌍 Sutton presents a four-part argument for the inevitable succession to AI or augmented humans, driven by lack of unified human consensus, eventual understanding of intelligence, superintelligence, and resource accumulation by intelligent entities.
  • 🤖 He views this as a major transition in the universe from replication (like biological life) to design (where AIs design other AIs), shifting from making copies without full understanding to creating intelligence with known mechanisms.
  • 🛡️ Concerns about this future include "corruption" from external knowledge (e.g., "viruses" or hidden goals) when integrating diverse AI experiences, suggesting a need for cybersecurity in digital intelligences.
  • ✅ Sutton suggests that, like raising children, the focus should be on instilling robust, steerable, and prosocial values in AI, rather than dictating specific outcomes for the future.
Knowledge graph40 entities · 27 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore
40 entities
Chapters12 moments

Key Moments

Transcript210 segments

Full Transcript

Topics15 themes

What’s Discussed

Reinforcement LearningLarge Language Models (LLMs)The Bitter LessonContinual LearningExperiential LearningImitation LearningWorld ModelsGoal-Oriented AIGeneralizationDeep LearningArtificial Neural NetworksAlphaZeroTemporal Difference LearningAI SuccessionSuperintelligence
Smart Objects40 · 27 links
People· 6
Products· 4
Medias· 4
Concepts· 23
Company· 1
Events· 2