Skip to main content

Proving LLM Provenance with Palimpsestic Memory Inference

[HPP] Percy LiangOctober 30, 202540 min
21 connections·40 entities in this video

Understanding LLM Provenance

  • 💡 The paper addresses the challenge of proving the statistical lineage of blackbox Large Language Models (LLMs), crucial for accountability and digital forensics.
  • 🎯 The core problem involves determining if a derivative model or generated text from Bob originated from a specific training run by Alice.
  • ✅ A key insight is palimpsestic memorization, where LLMs retain a stronger memory of data seen later in their training sequence, creating a unique statistical fingerprint.

Methodology and Key Requirements

  • 🔑 The approach frames the problem as an independence test, relying on the assumption that Alice's training data was randomly shuffled (Assumption A1).
  • 🔬 Provenance tests must be effective (high statistical power), transparent (no secret info from Bob), and non-invasive (no alteration to Alice's original training process).
  • 📊 Permutation testing is used to calculate p-values, leveraging the exchangeability property under the null hypothesis of independence.

Query Setting: Direct Model Interaction

  • 🔍 In the query setting, Alice interacts with Bob's model to estimate log likelihoods of her training examples.
  • 📈 Spearman rank correlation is used to measure the relationship between these likelihoods and the original training order, robustly detecting monotonic trends.
  • ✨ A significant improvement involves using an independent reference model to normalize for inherent text predictability, substantially boosting the signal-to-noise ratio and reducing query costs.
  • 🚀 Experiments showed the method is robust to fine-tuning and can even detect data mislabeling (e.g., Pythia 2.8b dduped), with evasion only possible by severely damaging model quality.

Observational Setting: Text-Only Analysis

  • 💬 The tabby_parts approach (Algorithm 2) partitions Alice's transcript and uses n-gram matching to correlate Bob's text with later training partitions.
  • ⚠️ tabby_parts is data-hungry, requiring hundreds of thousands to millions of tokens, making it impractical for short texts and sensitive to low-diversity generation.
  • 🧠 The shuff_obs approach (Algorithm 3) retrains models on shuffled data suffixes and compares likelihoods of Bob's text, demonstrating high token efficiency (hundreds of tokens).
  • 🛠️ A refinement for shuff_obs involves fine-tuning Alice's test models on Bob's text to improve robustness against Bob's own fine-tuning, though extreme fine-tuning can still limit detection.

Practical Implications and Future Outlook

  • 💰 The query setting can be surprisingly affordable for targeted audits ($180 for 8 million sequences), while shuff_obs requires significant GPU resources for retraining.
  • 🚧 The biggest non-technical hurdle is access to Alice's original ordered training transcript, though using publicly available ordered subsets offers a potential compromise.
  • 💡 The research highlights the persistence of memory in LLMs, raising profound questions about privacy risks and the design of future models to better understand and control order-dependent retention.
Knowledge graph40 entities · 21 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore
40 entities
Chapters3 moments

Key Moments

Transcript150 segments

Full Transcript

Topics15 themes

What’s Discussed

Blackbox modelsModel provenancePalimpsestic memorizationMembership inferenceLarge Language Models (LLMs)Independence testingPermutation testingSpearman rank correlationReference modelsFine-tuningN-gram modelsTraining data orderDigital forensicsModel accountabilityType I error control
Smart Objects40 · 21 links
Companies· 5
People· 2
Products· 7
Concepts· 23
Medias· 3