Deborah Raji on AI Audits and Accountability in the Age of AI

[HPP] Joy BuolamwiniNovember 7, 20251h 1min

30 connections·40 entities in this video→

Understanding AI's Societal Impact

⚠️ AI systems can cause significant harm, as demonstrated by cases like Robert Williams' false arrest due to facial recognition, Tammy Dobbs' reduced care hours from algorithmic assessment, and California nurses protesting generative AI tools impacting patient safety.
💡 Deborah Raji's work focuses on these real-world performance failures, challenging both overly optimistic and pessimistic views of AI by emphasizing its practical shortcomings.
🧠 AI is best understood as a sociotechnical system, where technological design choices interact symbiotically with societal outcomes, similar to the evolution of the bicycle.

AI in Policy and Evaluation Challenges

🎯 Policy discussions often view AI either as a product subject to consumer protection laws (like cars or food safety) or as a policy intervention requiring statistical evidence for its effectiveness, akin to "Moneyball for Government."
🔬 Evaluating AI models extends beyond simple algorithm selection to critical areas like product safety, legal litigation, and procurement processes, making assessment a highly loaded process.
🔑 Traditional benchmarking practices often violate key statistical assumptions, such as unbiased data samples and distinct training/test sets, leading to issues like benchmark bias and data contamination.

Critiquing AI Benchmarks

🔍 Benchmarks like ImageNet and GLUE, despite claims of "general" performance, exhibit inherent contextualization and bias, often reflecting Western perspectives or specific task sets.
⚡ This benchmark bias can lead to significant real-world failures, as seen in facial recognition systems drastically underperforming for darker-skinned female subgroups, prompting NIST intervention.
📈 The culture of evaluation for large language models (LLMs) has shifted from pragmatic, ad hoc experiments (GPT-2) to linguistic competence (GPT-3) and increasingly to marketing-driven metrics (GPT-4) like exam scores.

The Importance of AI Audits

✅ Accountability in AI means that those impacted by AI decisions have the information to judge the quality of those decisions, and their judgment leads to consequential changes in the actor's behavior.
🛠️ AI audits serve as independent evaluations aimed at ensuring accountability, categorizing auditors as either internal (collaborating with the target, pre-deployment) or external (independent, post-deployment, protecting affected groups).
📝 Internal audits often rely on comprehensive documentation, like model cards, to capture and relay key engineering decisions to various stakeholders within an organization.

Overcoming External Audit Barriers

⚠️ External auditing faces significant challenges, including harms discovery (AI use often invisible to those impacted, addressed by AI inventories) and data access barriers (anti-audit clauses, legal retaliation, now mitigated by legal protections like CFAA exemptions).
🤝 Ensuring auditor independence is crucial, as seen in cases where consulting groups' audit results were controlled or manipulated by the audited company, undermining transparency.
🚀 The policy ecosystem for auditing, drawing lessons from fields like finance and medical devices, is crucial for legally protecting auditors and enabling effective external oversight.

Advancing AI Accountability

🌐 Current policy efforts, including legislative actions (e.g., California bills for LLM audits), executive orders (e.g., Biden AI EO requiring disclosure), and voluntary commitments, are actively shaping the future of AI auditing.
💡 The establishment of AI Safety Institutes globally underscores a growing governmental commitment to AI evaluation research that directly feeds into accountability processes.
👏 Ultimately, the goal is to ensure responsible deployment of AI systems, recognizing that their failures disproportionately affect historically marginalized individuals who have the least recourse.

Knowledge graph40 entities · 30 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Ask, don't scrub

Have a conversation with this video.

VERIDIVE answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Chapters19 moments

Key Moments

Transcript223 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

VERIDIVE maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

AI systemsAI accountabilityAI auditsMachine learning evaluationSociotechnical systemsBenchmarkingLarge Language Models (LLMs)Facial recognitionConsumer protection lawsAuditor independenceAI policyAI Safety InstitutesProduct safetyData contaminationElectronic Health Records

Smart Objects40 · 30 links

Concepts· 15

People· 5

Medias· 4

Products· 5

Companies· 10

Event· 1

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free