Proving LLM Provenance with Palimpsestic Memory Inference

[HPP] Percy LiangOctober 30, 202540 min

21 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Understanding LLM Provenance

💡 The paper addresses the challenge of proving the statistical lineage of blackbox Large Language Models (LLMs), crucial for accountability and digital forensics.
🎯 The core problem involves determining if a derivative model or generated text from Bob originated from a specific training run by Alice.
✅ A key insight is palimpsestic memorization, where LLMs retain a stronger memory of data seen later in their training sequence, creating a unique statistical fingerprint.

Methodology and Key Requirements

🔑 The approach frames the problem as an independence test, relying on the assumption that Alice's training data was randomly shuffled (Assumption A1).
🔬 Provenance tests must be effective (high statistical power), transparent (no secret info from Bob), and non-invasive (no alteration to Alice's original training process).
📊 Permutation testing is used to calculate p-values, leveraging the exchangeability property under the null hypothesis of independence.

Query Setting: Direct Model Interaction

🔍 In the query setting, Alice interacts with Bob's model to estimate log likelihoods of her training examples.
📈 Spearman rank correlation is used to measure the relationship between these likelihoods and the original training order, robustly detecting monotonic trends.
✨ A significant improvement involves using an independent reference model to normalize for inherent text predictability, substantially boosting the signal-to-noise ratio and reducing query costs.
🚀 Experiments showed the method is robust to fine-tuning and can even detect data mislabeling (e.g., Pythia 2.8b dduped), with evasion only possible by severely damaging model quality.

Observational Setting: Text-Only Analysis

💬 The tabby_parts approach (Algorithm 2) partitions Alice's transcript and uses n-gram matching to correlate Bob's text with later training partitions.
⚠️ tabby_parts is data-hungry, requiring hundreds of thousands to millions of tokens, making it impractical for short texts and sensitive to low-diversity generation.
🧠 The shuff_obs approach (Algorithm 3) retrains models on shuffled data suffixes and compares likelihoods of Bob's text, demonstrating high token efficiency (hundreds of tokens).
🛠️ A refinement for shuff_obs involves fine-tuning Alice's test models on Bob's text to improve robustness against Bob's own fine-tuning, though extreme fine-tuning can still limit detection.

Practical Implications and Future Outlook

💰 The query setting can be surprisingly affordable for targeted audits ($180 for 8 million sequences), while shuff_obs requires significant GPU resources for retraining.
🚧 The biggest non-technical hurdle is access to Alice's original ordered training transcript, though using publicly available ordered subsets offers a potential compromise.
💡 The research highlights the persistence of memory in LLMs, raising profound questions about privacy risks and the design of future models to better understand and control order-dependent retention.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 21 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters3 moments

Key Moments

Transcript150 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Blackbox modelsModel provenancePalimpsestic memorizationMembership inferenceLarge Language Models (LLMs)Independence testingPermutation testingSpearman rank correlationReference modelsFine-tuningN-gram modelsTraining data orderDigital forensicsModel accountabilityType I error control

Smart Objects40 · 21 links

Companies· 5

People· 2

Products· 7

Concepts· 23

Medias· 3

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free