ANDREJ KARPATHY 2025 LLM Review: RLVR, Jagged Intelligence, & The Vibe Coding Revolution
[HPP] Andrej KarpathyDecember 21, 202535 min
28 connections·40 entities in this video→The Rise of RLVR and Algorithmic Reasoning
- 💡 Reinforcement Learning from Verifiable Rewards (RLVR) replaced RLHF, becoming the new standard for LLM training by using objective computational environments for rewards.
- 🎯 This method allows for deep, intensive optimization in domains like math and code, leading to the spontaneous emergence of algorithmic reasoning strategies.
- 🧠 Models learn to break down complex problems into intermediate steps and perform error recovery, managing internal operations like a working memory.
- 📈 Compute allocation shifted from pre-training to these deep RL runs, enabling a new scaling law where test-time compute can increase intelligence on demand.
Jagged Intelligence and Benchmark Challenges
- 👻 LLMs are described as "summoned ghosts" with wildly uneven capabilities, excelling in verifiable domains but struggling with subjective areas like common sense.
- ⚠️ This jagged performance stems from the hyper-specific optimization of RLVR, which lacks strong objective reward signals for non-verifiable domains.
- 📊 The industry faced a benchmark crisis as labs "benchmaxed" by optimizing models specifically for verifiable tests, leading to a decoupling of benchmark scores from generalizable real-world capability.
The Thick LLM App Layer and Local Agents
- 🛠️ A thick LLM app layer is essential for reliability, handling complex tasks like context engineering and orchestrating multiple LLM calls into directed acyclic graphs (DAGs).
- 🚀 This layer verticalizes generalist models into specialists by integrating private data, sensors, actuators, and real-world feedback loops.
- 💻 Local host LLM agents, like Claude Code, provide low-latency, high-fidelity access to a user's environment, proving architecturally superior for managing brittle, jagged agency in development.
Vibe Coding and UI Evolution
- ✨ Vibe coding emerged as a revolution, allowing users to create functional programs from high-level English instructions, making code ephemeral, malleable, and discardable.
- 💰 This democratizes programming and enables professionals to build specialized, temporary tools with near-zero creation cost, shifting focus from boilerplate to high-level architecture.
- 🖼️ The future of LLM interfaces, hinted at by Nano Banana, moves beyond text-based chat to a unified multimodal generative experience with deeply integrated text, image, and world knowledge.
- 🌐 This aims to create intuitive, dynamic, spatially organized LLM GUIs that better align with human visual and spatial preferences.
Knowledge graph40 entities · 28 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover · drag to explore
40 entities
Chapters4 moments
Key Moments
Transcript133 segments
Full Transcript
Topics15 themes
What’s Discussed
Reinforcement Learning from Verifiable Rewards (RLVR)LLM TrainingAlgorithmic ReasoningTest-Time ComputeJagged IntelligenceBenchmark CrisisBenchmaxingLLM App LayerContext EngineeringOrchestrationFeedback LoopsLLM AgentsLocal Host ParadigmVibe CodingMultimodal Generative Experience
Smart Objects40 · 28 links
Concepts· 29
Company· 1
Products· 6
Medias· 2
Events· 2