ANDREJ KARPATHY 2025 LLM Review: RLVR, Jagged Intelligence, & The Vibe Coding Revolution

[HPP] Andrej KarpathyDecember 21, 202535 min

28 connections·40 entities in this video→

The Rise of RLVR and Algorithmic Reasoning

💡 Reinforcement Learning from Verifiable Rewards (RLVR) replaced RLHF, becoming the new standard for LLM training by using objective computational environments for rewards.
🎯 This method allows for deep, intensive optimization in domains like math and code, leading to the spontaneous emergence of algorithmic reasoning strategies.
🧠 Models learn to break down complex problems into intermediate steps and perform error recovery, managing internal operations like a working memory.
📈 Compute allocation shifted from pre-training to these deep RL runs, enabling a new scaling law where test-time compute can increase intelligence on demand.

Jagged Intelligence and Benchmark Challenges

👻 LLMs are described as "summoned ghosts" with wildly uneven capabilities, excelling in verifiable domains but struggling with subjective areas like common sense.
⚠️ This jagged performance stems from the hyper-specific optimization of RLVR, which lacks strong objective reward signals for non-verifiable domains.
📊 The industry faced a benchmark crisis as labs "benchmaxed" by optimizing models specifically for verifiable tests, leading to a decoupling of benchmark scores from generalizable real-world capability.

The Thick LLM App Layer and Local Agents

🛠️ A thick LLM app layer is essential for reliability, handling complex tasks like context engineering and orchestrating multiple LLM calls into directed acyclic graphs (DAGs).
🚀 This layer verticalizes generalist models into specialists by integrating private data, sensors, actuators, and real-world feedback loops.
💻 Local host LLM agents, like Claude Code, provide low-latency, high-fidelity access to a user's environment, proving architecturally superior for managing brittle, jagged agency in development.

Vibe Coding and UI Evolution

✨ Vibe coding emerged as a revolution, allowing users to create functional programs from high-level English instructions, making code ephemeral, malleable, and discardable.
💰 This democratizes programming and enables professionals to build specialized, temporary tools with near-zero creation cost, shifting focus from boilerplate to high-level architecture.
🖼️ The future of LLM interfaces, hinted at by Nano Banana, moves beyond text-based chat to a unified multimodal generative experience with deeply integrated text, image, and world knowledge.
🌐 This aims to create intuitive, dynamic, spatially organized LLM GUIs that better align with human visual and spatial preferences.

Knowledge graph40 entities · 28 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Ask, don't scrub

Have a conversation with this video.

VERIDIVE answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Chapters4 moments

Key Moments

Transcript133 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

VERIDIVE maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Reinforcement Learning from Verifiable Rewards (RLVR)LLM TrainingAlgorithmic ReasoningTest-Time ComputeJagged IntelligenceBenchmark CrisisBenchmaxingLLM App LayerContext EngineeringOrchestrationFeedback LoopsLLM AgentsLocal Host ParadigmVibe CodingMultimodal Generative Experience

Smart Objects40 · 28 links

Concepts· 29

Company· 1

Products· 6

Medias· 2

Events· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free