Robot Learning: Data Scaling and Policy Improvement

[HPP] Pieter AbbeelOctober 17, 202554 min

24 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

The Challenge of Robot Deployment

🤖 Traditional robot learning pipelines focus on data collection and policy training, but often treat deployment merely as an evaluation phase.
⚠️ Current deployment methods, relying on manual supervision by PhD students, are not scalable for industrial applications.
💡 The speaker emphasizes that deployment is a crucial data generation process, where every robot trajectory, successful or not, provides valuable information for policy improvement.

Amada: Human-in-the-Loop Policy Improvement

👨‍💻 The Amada framework introduces a human-in-the-loop system to enable scalable real-world deployment and adaptation.
🚨 It features Float, an autonomous online failure detection module based on optimal transport (OT cost), which identifies errors and provides early warnings.
🔄 An adaptive rewinding mechanism resets the robot to the state before a failure, allowing operators to provide corrective human interventions and collect high-quality data.
📈 Experiments with a multi-robot factory setup show Amada consistently improves policy performance and generalization, while significantly reducing human intervention rates over time.

SOUL: Autonomous Policy Self-Improvement

🚀 To achieve greater automation, the SOUL framework focuses on robot policy self-improvement through efficient exploration in the real world.
📉 Traditional diffusion policies often suffer from mode collapse, generating repetitive failures, and standard action-level exploration can lead to jerky, unsafe motions.
🧭 SOUL proposes manifold exploration, constraining exploration to the task manifold to generate diverse yet smooth and valid actions.
🧠 An information bottleneck creates a well-shaped latent space, ensuring exploration focuses on task-relevant factors and improves sample efficiency.
✅ The system demonstrates higher success rates and smoother motions compared to previous methods, with the ability for human-guided exploration without teleoperation.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 24 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters18 moments

Key Moments

Transcript196 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Robot learningData scalingPolicy improvementImitation learningPolicy deploymentHuman-in-the-loop systemsAutonomous failure detectionOptimal transportMulti-robot systemsPolicy generalizationOnline explorationManifold explorationDiffusion policyLatent spaceSample efficiency

Smart Objects40 · 24 links

People· 4

Concepts· 17

Location· 1

Medias· 2

Companies· 9

Products· 5

Events· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free