Koray Kavukcuoglu: Building AGI Through Products

[HPP] Logan KilpatrickNovember 28, 202511 min

32 connections·38 entities in this video→

Redefining AGI Progress

💡 The development of Artificial General Intelligence (AGI) is framed as a joint effort with the world, not a theoretical research project confined to a lab.
🎯 Real-world utility and user feedback are considered the ultimate measures of progress, surpassing traditional benchmarks.
📈 While initial benchmarks provide validation, they are fleeting; models like Gemini 3 quickly make them irrelevant by achieving significant advancements (e.g., HLE and RKGI2 from 1-2% to over 40%).
🧠 The GPQA Diamond benchmark remains a challenge, requiring expert-level scientific reasoning and multi-step thinking, indicating that hard problems in reasoning are still unsolved.

Key Development Focus Areas

✅ Instruction following is paramount, ensuring models precisely understand and execute nuanced requests rather than just inferring intent.
🌍 Internationalization is a huge strategic focus, expanding model utility and feedback loops to diverse languages like Hindi, Portuguese, and Swahili.
🚀 Agentic actions and code are seen as a major intelligence multiplier, enabling models to perform tasks through function and tool calls.
🛠️ The concept of "vibe coding" allows non-programmers to generate working applications from high-level ideas, democratizing creation.

Engineering Mindset for AGI

⚙️ Building AGI requires an engineering mindset, integrating AI technology into every Google product to force robustness and gather real-world user signals.
🤝 Platforms like Anti-gravity facilitate the deployment of AI agents, providing crucial data on model performance, weaknesses, and areas for architectural improvement.
🔒 Safety and security are first principles, built into the entire development process from pre-training through post-training, rather than being added as an afterthought.
🌐 This effort involves massive global coordination across Google teams, co-designing software and hardware (chips, data centers, networking) for global scale robustness.

DeepMind's Legacy and Future

🔬 DeepMind's history with large, specialized teams on projects like DQN, AlphaGo, and AlphaFold forms the cultural backbone for current AGI development.
✨ Multimodality (understanding text, images, audio) is naturally emerging as underlying architectures converge, allowing world knowledge to transfer between domains.
🖼️ The Nano Banana Pro image generation model, built on Gemini 3 Pro, exemplifies this by leveraging the text model's deep world understanding for complex tasks like creating infographics from dense documents.

The Biggest Risk to Innovation

⚠️ A humbling insight reveals that the biggest risk to AGI development is running out of innovation, not a lack of money or compute resources.
💡 DeepMind's aggressive innovation was partly driven by feeling like an underdog two and a half years ago, pushing them to join the "leadership group" in LLM development.

Knowledge graph38 entities · 32 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

38 entities

Ask, don't scrub

Have a conversation with this video.

VERIDIVE answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Chapters2 moments

Key Moments

Transcript45 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

VERIDIVE maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Artificial General Intelligence (AGI)Gemini modelsReal-world utilityBenchmarkingInstruction followingInternationalizationAgentic capabilitiesFunction callingTool callingCode generationEngineering mindsetProduct integrationMultimodalityInnovationDeepMind

Smart Objects38 · 32 links

Companies· 2

Products· 12

Concepts· 19

People· 3

Events· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free