RAG LLMs Are Not Safer: Analyzing Risks with Dr. Sebastian Gehrmann

Super Data Science: ML & AI Podcast with Jon KrohnJuly 15, 202553 min633 views

23 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

The Counterintuitive Safety Risks of RAG

💡 Retrieval-Augmented Generation (RAG), while essential for grounding LLM responses in trusted data, can paradoxically make LLMs less safe by circumventing built-in safety mechanisms.
⚠️ The paper "RAG LLMs are Not Safer" demonstrates that coupling unsafe queries with innocuous documents can lead to unsafe LLM outputs, contrary to the belief that RAG inherently increases safety.
🎯 The primary focus of RAG's benefit is on helpfulness and honesty (reducing hallucinations), while harmlessness is a separate, critical concern that RAG does not automatically guarantee.

Understanding AI Attack Surfaces

🔍 An attack surface in AI refers to the ways an LLM application can be misused, either through harmful inputs, unsafe outputs, or facilitating unintended actions.
🏦 For financial services, attack surfaces include enabling financial crimes, providing unsolicited advice, or disclosing trading strategies.
🌍 LLM providers cannot anticipate every use case across diverse industries and jurisdictions, making it crucial for application developers to understand their specific risks.

Mitigating RAG Risks

🛠️ Organizations must evaluate AI systems within their specific deployment context, as general-purpose safety measures often fail in domain-specific scenarios.
🛡️ Effective RAG safety involves a guardrail-retrieval-answer-guardrail architecture, rather than a simple retrieval-generation flow.
📊 Custom content risk taxonomies and red teaming events are vital for identifying and quantifying domain-specific vulnerabilities.

Context Length and Model Size Considerations

📏 While longer context windows enhance LLM capabilities, they can also increase latency and cost, and potentially weaken built-in safety guardrails if not managed carefully.
🧠 Generally, safer LLMs (often larger or more capable) tend to be more robust when RAG is applied, though no system is entirely unbreakable.
⚖️ Refusing to answer can be a safety mechanism, but if a model refuses safe questions, it compromises helpfulness, highlighting the need for multifaceted evaluation.

Domain-Specific AI in Finance

🏦 Foundation models are typically not trained on finance-specific knowledge, creating limitations in both helpfulness and harmlessness for financial applications.
⚙️ Off-the-shelf safety mechanisms from LLM providers are often insufficient for highly regulated domains like finance, healthcare, or law.
🚀 Best practices include evaluating systems in context, adapting general taxonomies to specific domains, and conducting rigorous red teaming to ensure trustworthiness and reliability.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 23 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters19 moments

Key Moments

Transcript199 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics14 themes

What’s Discussed

Retrieval-Augmented Generation (RAG)Large Language Models (LLMs)AI SafetyAttack SurfacesResponsible AIHallucinationsHarmlessnessHelpfulnessContext LengthModel SizeRed TeamingFinancial ServicesRisk ManagementGuardrails

Smart Objects40 · 23 links

Concepts· 26

Companies· 4

Products· 7

Medias· 2

Person· 1

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free