Skip to main content

AI Auditing and Evaluation: Beyond Benchmarking with Inioluwa Deborah Raji

[HPP] Timnit GebruJanuary 22, 20261h 3min
21 connections·40 entities in this video→

The Evolving Role of AI Evaluation

  • πŸ’‘ AI systems are increasingly deployed in critical domains like criminal justice and healthcare, leading to real human costs when they fail.
  • πŸ“Œ Historically, consumer protection movements have driven product safety evaluations, a model relevant to today's AI landscape.
  • πŸš€ AI evaluation has shifted from merely ranking algorithms to playing a broader role in product deployment, documentation, and legal evidence.

Auditing for Accountability

  • 🎯 The AI auditing process involves identifying harms, evaluating against standards, communicating results, advocating for change, and ensuring legal accountability through consequences.
  • πŸ”¬ The Gender Shades project demonstrated how evaluating facial recognition bias led to significant changes in documentation, procurement, and legal actions.
  • πŸ“ˆ Postmarket surveillance methods, inspired by vaccine adverse event reporting, can statistically identify disproportionately harmed subgroups in AI systems.

Beyond Traditional Benchmarking

  • ⚠️ Traditional AI benchmarks often suffer from a construct validity challenge, failing to accurately represent real-world performance or generalizability.
  • 🧠 Realistic evaluations, like using patient notes for clinical LLMs, reveal that models perform differently and often worse than in idealized test scenarios.
  • 🧩 AI deployments should be viewed as policy interventions, where experiment design choices significantly impact user responsiveness and the accurate measurement of causal effects.

Operationalizing AI Audits

  • πŸ› οΈ Policy engagement is crucial for translating new evaluation methods into effective AI audit implementation, addressing data access and standard setting.
  • βœ… Model cards and other documentation practices are becoming essential for clinical decision support tools and government AI use, ensuring transparency and accountability.
  • 🌐 Multi-disciplinary collaboration and AI Safety Institutes are vital for developing the necessary technical and institutional infrastructure for safe and widespread AI adoption.
Knowledge graph40 entities Β· 21 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover Β· drag to explore
40 entities
Chapters18 moments

Key Moments

Transcript235 segments

Full Transcript

Topics15 themes

What’s Discussed

Algorithmic auditingMachine learning evaluationBenchmarking paradigmFacial recognition systemsLarge Language Models (LLMs)Consumer protectionPostmarket surveillanceAI incidentsPolicy evaluationConstruct validityExperiment designJudge responsivenessCausal inferenceModel cardsAI Safety Institutes
Smart Objects40 Β· 21 links
PeopleΒ· 5
ConceptsΒ· 17
CompaniesΒ· 10
ProductsΒ· 3
MediasΒ· 2
EventsΒ· 2
LocationΒ· 1