Skip to main content

Spatial Intelligence and World Models: The Next Frontier After LLMs

[HPP] Fei-Fei LiNovember 25, 20251h 0min
38 connections·40 entities in this video

The Rise of Spatial Intelligence

  • 💡 Fei-Fei Li and Justin Johnson, co-founders of World Labs, discuss the evolution of AI beyond Large Language Models (LLMs) towards spatial intelligence and world models.
  • 🧠 Spatial intelligence is defined as the capability to reason, understand, move, and interact in space, contrasting with linguistic intelligence.
  • 🚀 The speakers emphasize that language is a lossy, low-bandwidth channel for describing the rich 3D/4D world, making spatial understanding crucial for true intelligence.

Introducing Marble: A Generative 3D World Model

  • 🎯 Marble is World Labs' first spatial intelligence model, a generative model of 3D worlds that creates editable environments from text, images, and other spatial inputs.
  • 🛠️ It natively outputs Gaussian splats, which are tiny, semi-transparent particles that can be rendered efficiently in real-time on various devices, including phones and VR headsets.
  • ✅ Marble allows for precise camera control and interactive editing, enabling users to modify scenes and generate new worlds based on these edits.

Physics, Causality, and Model Understanding

  • 🔬 A key discussion point is whether current models truly "understand" physics or merely fit patterns, using the example of predicting planetary orbits versus discovering F=ma.
  • 💡 The challenge lies in moving beyond pattern fitting to genuine causal reasoning, potentially by attaching physical properties to splats or distilling physics engines into neural networks.
  • ⚠️ The speakers acknowledge that while models can generate plausible scenes, the "understanding" of underlying physical structures (e.g., how an arch works) remains a philosophical and technical challenge.

Applications and Future Vision

  • 🎮 Marble is designed to be immediately useful, with emerging use cases in gaming, VFX, film, virtual production, and architectural design (e.g., kitchen remodels).
  • 🤖 A significant future application is generating synthetic simulation worlds for training embodied agents and robotics, addressing the data starvation problem in robotic learning.
  • 🌱 The long-term vision is for spatial intelligence and language intelligence to work together in multimodal systems, complementing LLMs with rich, embodied models of the world for broader applications in science, medicine, and decision-making.

The Evolving Landscape of AI and Academia

  • 📈 The history of deep learning is linked to scaling compute, with a million-fold increase in marshallable compute since the AlexNet era, driving the need for models that can process vast amounts of visual and spatial data.
  • 🎓 The role of academia is shifting from training the biggest models to exploring "wacky ideas," new algorithms, architectures, and theoretical underpinnings of large models.
  • 🤝 Concerns are raised about the under-resourcing of academia compared to industry, advocating for initiatives like national AI compute clouds and open benchmarks to foster a healthy ecosystem.
Knowledge graph40 entities · 38 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore
40 entities
Chapters18 moments

Key Moments

Transcript226 segments

Full Transcript

Topics15 themes

What’s Discussed

Spatial IntelligenceWorld ModelsLarge Language Models (LLMs)Generative AI3D EnvironmentsGaussian SplatsComputer VisionImageNetDeep LearningPhysics EnginesRobotics SimulationMultimodal SystemsTransformersHardware LotteryOpen Science
Smart Objects40 · 38 links
People· 6
Products· 6
Companies· 3
Concepts· 24
Media· 1