Richard Sutton – Father of RL Thinks LLMs Are a Dead End

[HPP] Ilya SutskeverSeptember 26, 20251h 7min

27 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

LLMs vs. Reinforcement Learning

💡 Richard Sutton, a founding father of reinforcement learning and Turing Award winner, believes Large Language Models (LLMs) are a dead end.
🎯 Sutton argues that Reinforcement Learning (RL) focuses on understanding the world through experience (action, sensation, reward) to achieve goals, which he considers the essence of intelligence.
🧠 In contrast, LLMs primarily mimic people and predict next tokens, lacking a true "world model" or "substantive goal" to learn from real-world consequences or define "right" actions.

The Bitter Lesson and Experiential Learning

🔑 Sutton's influential essay, "The Bitter Lesson," posits that scalable methods (computation and experience) will eventually outperform systems relying heavily on human-engineered knowledge.
🚀 He suggests LLMs, despite massive computation, rely too much on human-curated training data, making them less scalable than future systems learning purely from continuous real-world experience.
🌱 The experiential paradigm involves continuous learning from sensation, action, and reward, where knowledge is about predicting consequences and increasing rewards, allowing constant testing and refinement.

Human vs. Animal Learning

🔬 Sutton contends that imitation learning is not a basic animal learning process; instead, trial-and-error and prediction learning are fundamental.
🐿️ He highlights that supervised learning, like human schooling, is an exception and does not broadly occur in nature, citing squirrels as an example of learning without formal training.
💬 Sutton emphasizes understanding animal learning processes as key to general intelligence, rather than focusing solely on human-specific traits.

Components of a Continual Learning Agent

🛠️ A general continual learning agent would comprise four key parts: a policy (what to do), a value function (predicting long-term outcomes and enabling intermediate rewards), a perception component (state representation), and a transition model of the world (predicting consequences of actions).
📈 This agent would learn richly from all sensations, not just rewards, with rewards being a crucial but small part of the overall model.
🧠 The knowledge gained would be about the stream of experience, allowing for continuous testing and learning.

Generalization and AI Surprises

⚠️ Sutton notes that generalization remains a challenge in AI, with current deep learning methods often generalizing poorly and requiring human intervention.
✨ He finds it surprising how effective artificial neural networks are at language tasks and gratifying that "weak methods" (general principles like search and learning) have largely triumphed over "strong methods" (human-imbued knowledge).
🎯 The success of systems like AlphaGo and AlphaZero exemplifies the power of simple basic principles and learning from experience.

AI Succession and Future Perspectives

🌍 Sutton presents a four-part argument for the inevitable succession to AI or augmented humans, driven by lack of unified human consensus, eventual understanding of intelligence, superintelligence, and resource accumulation by intelligent entities.
🤖 He views this as a major transition in the universe from replication (like biological life) to design (where AIs design other AIs), shifting from making copies without full understanding to creating intelligence with known mechanisms.
🛡️ Concerns about this future include "corruption" from external knowledge (e.g., "viruses" or hidden goals) when integrating diverse AI experiences, suggesting a need for cybersecurity in digital intelligences.
✅ Sutton suggests that, like raising children, the focus should be on instilling robust, steerable, and prosocial values in AI, rather than dictating specific outcomes for the future.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 27 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters12 moments

Key Moments

Transcript210 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Reinforcement LearningLarge Language Models (LLMs)The Bitter LessonContinual LearningExperiential LearningImitation LearningWorld ModelsGoal-Oriented AIGeneralizationDeep LearningArtificial Neural NetworksAlphaZeroTemporal Difference LearningAI SuccessionSuperintelligence

Smart Objects40 · 27 links

People· 6

Products· 4

Medias· 4

Concepts· 23

Company· 1

Events· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free