Yoshua Bengio - Disentangling Agency & Predictive Power Without Solving ELK [Alignment Workshop]

[HPP] Yoshua BengioFebruary 18, 202630 min

25 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Building Trustworthy AI

💡 The Scientist AI research program aims to create highly trustworthy machines that deeply understand the world.
🎯 This approach seeks to disentangle an AI's understanding of the world from its goals and desires, unlike human intelligence.
🧠 The goal is to develop AI predictors that approximate Bayesian posteriors without developing agency or preferences for specific outcomes.

Addressing Implicit Goals & Agency

⚠️ Current AI systems, particularly those using Reinforcement Learning or pre-training, can develop implicit, unchosen goals that might be harmful.
🔑 Agency is defined as the robust achievement of goals despite randomness or adversaries, and dangerous agency involves intentions to harm.
📊 The speaker claims that agentic predictors occupy an exponentially small volume in the space of all possible predictors, making their accidental emergence during training unlikely.

The Truthification Pipeline

🛠️ A "truthification pipeline" transforms training data by using distinct syntax for verified facts (e.g., "X is true") versus claims made by people (e.g., "someone wrote X").
✅ This separation allows the AI to understand what is trustworthy and enables users to query what the AI genuinely believes, rather than what a human might say.
🔍 This mechanism is crucial for ensuring the AI is not agentic and for distinguishing between factual reality and communication acts.

Epistemic Correctness & ELK

💡 When the Scientist AI issues a high-confidence claim, it is designed to be "epistemically correct," meaning it does not lie, even if it might withhold some knowledge.
🎯 This provides sufficient safety guarantees, even if it doesn't fully solve the challenge of Eliciting Latent Knowledge (ELK).
📈 Uncertain statements, such as future harm, can be reformulated as certain statements about probabilities (e.g., "the probability of harm is X"), making them amenable to this trustworthy prediction.

Training & Future Implications

🚫 Training must ensure no two-way interaction between the AI and the world (e.g., no online Reinforcement Learning) to prevent the AI from influencing its environment.
🚀 This research program is the foundation for LawZero, a new nonprofit dedicated to developing this type of AI.
🧩 The Scientist AI serves as a critical building block or "guardrail" for constructing broader, safe agentic systems, with humans ultimately defining ethical boundaries.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 25 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters12 moments

Key Moments

Transcript113 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics13 themes

What’s Discussed

Scientist AIAI AgencyPredictive PowerEliciting Latent Knowledge (ELK)Bayesian PosteriorTruthification PipelineLatent VariablesEpistemic CorrectnessImplicit GoalsReinforcement LearningLawZeroSuperintelligenceCompositional Generalization

Smart Objects40 · 25 links

People· 3

Concepts· 29

Companies· 2

Medias· 3

Event· 1

Locations· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free