Skip to main content

Robot Learning: Data Scaling and Policy Improvement

[HPP] Pieter AbbeelOctober 17, 202554 min
24 connections·40 entities in this video

The Challenge of Robot Deployment

  • 🤖 Traditional robot learning pipelines focus on data collection and policy training, but often treat deployment merely as an evaluation phase.
  • ⚠️ Current deployment methods, relying on manual supervision by PhD students, are not scalable for industrial applications.
  • 💡 The speaker emphasizes that deployment is a crucial data generation process, where every robot trajectory, successful or not, provides valuable information for policy improvement.

Amada: Human-in-the-Loop Policy Improvement

  • 👨‍💻 The Amada framework introduces a human-in-the-loop system to enable scalable real-world deployment and adaptation.
  • 🚨 It features Float, an autonomous online failure detection module based on optimal transport (OT cost), which identifies errors and provides early warnings.
  • 🔄 An adaptive rewinding mechanism resets the robot to the state before a failure, allowing operators to provide corrective human interventions and collect high-quality data.
  • 📈 Experiments with a multi-robot factory setup show Amada consistently improves policy performance and generalization, while significantly reducing human intervention rates over time.

SOUL: Autonomous Policy Self-Improvement

  • 🚀 To achieve greater automation, the SOUL framework focuses on robot policy self-improvement through efficient exploration in the real world.
  • 📉 Traditional diffusion policies often suffer from mode collapse, generating repetitive failures, and standard action-level exploration can lead to jerky, unsafe motions.
  • 🧭 SOUL proposes manifold exploration, constraining exploration to the task manifold to generate diverse yet smooth and valid actions.
  • 🧠 An information bottleneck creates a well-shaped latent space, ensuring exploration focuses on task-relevant factors and improves sample efficiency.
  • ✅ The system demonstrates higher success rates and smoother motions compared to previous methods, with the ability for human-guided exploration without teleoperation.
Knowledge graph40 entities · 24 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore
40 entities
Chapters18 moments

Key Moments

Transcript196 segments

Full Transcript

Topics15 themes

What’s Discussed

Robot learningData scalingPolicy improvementImitation learningPolicy deploymentHuman-in-the-loop systemsAutonomous failure detectionOptimal transportMulti-robot systemsPolicy generalizationOnline explorationManifold explorationDiffusion policyLatent spaceSample efficiency
Smart Objects40 · 24 links
People· 4
Concepts· 17
Location· 1
Medias· 2
Companies· 9
Products· 5
Events· 2