Skip to main content

Sherry Yang - Learning World Models and Physical Agents

[HPP] Percy LiangOctober 21, 202557 min
34 connections·40 entities in this video

The Challenge of Physical Agents

  • ⚠️ Learning physical agents is difficult due to the high cost of real-world robot interactions, including time, money, and safety risks.
  • 💡 In contrast, agents in low-cost virtual environments (like Go or LLMs for coding) can achieve superhuman performance because they can learn from extensive interaction.

Advances in World Model Learning

  • 🧠 A world model is a dynamics model that predicts future frames based on current observations and actions, acting as a learned simulator.
  • 🚀 Recent progress is driven by internet-scale video data and scalable video generation architectures (e.g., transformers, latent diffusion models).
  • ✅ These models can integrate diverse data (simulated, real robot, human egocentric) and enable controllable video generation for various tasks and actions.

Evaluating Policies with World Models

  • 📊 World models offer a cheap, safe, and reproducible way to evaluate robot policies, overcoming limitations of real-world and traditional simulated evaluations.
  • 🤖 Policies are rolled out in the world model, and a Vision-Language Model (VLM) acts as a reward model to assess task success.
  • 🔬 This approach allows for out-of-distribution testing using image editing tools to introduce novel objects or distractors, revealing policy robustness and generalization gaps (e.g., issues with shapes vs. colors).

Improving Policies through RL and Planning

  • 📈 World models facilitate reinforcement learning (RL) by providing a low-cost environment for policy optimization using VLM-derived rewards.
  • 💡 They also enable hierarchical planning, where complex tasks are broken down into language-guided sub-steps, with the world model generating videos for each step.
  • 🎯 This approach leverages internet-scale supervision for high-level planning, allowing for more effective sim-to-real transfer and knowledge sharing across different robot morphologies.

Future Directions and Challenges

  • 🔍 Key challenges include reducing hallucinations and determining the optimal temporal and spatial resolution for world models to be useful for downstream tasks.
  • 🌱 Further work is needed to effectively utilize imperfect world models and scale up data collection to enhance their realism and robustness.
Knowledge graph40 entities · 34 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore
40 entities
Chapters19 moments

Key Moments

Transcript216 segments

Full Transcript

Topics15 themes

What’s Discussed

World ModelsPhysical AgentsReinforcement LearningGenerative ModelingRoboticsInternet-Scale Video DataVideo Generation ArchitecturesConditional Video GenerationPolicy EvaluationVision-Language Models (VLM)Sim-to-Real TransferHierarchical PlanningLow-Level Robot ControlsOut-of-Distribution TestingBehavioral Cloning
Smart Objects40 · 34 links
Concepts· 13
People· 6
Companies· 10
Products· 6
Medias· 3
Events· 2