Jeremy Howard: Why Finetuning is Flawed and the Future of Continued Pre-training in AI

[HPP] Jeremy HowardOctober 9, 20256 min

15 connections·26 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

The Evolution of AI Training

💡 Jeremy Howard, co-founder of Fast.ai, initially pioneered finetuning with the groundbreaking ULMFiT technique.
🚀 In 2018, ULMFiT revolutionized AI by enabling powerful models with less data and computing power, contributing to AI democratization.
✅ ULMFiT involved general pre-training on large datasets, followed by domain-specific and task-specific finetuning for specialized applications.

The Flaw: Catastrophic Forgetting

⚠️ Howard now argues that traditional finetuning suffers from catastrophic forgetting, a major unforeseen problem.
🧠 This phenomenon causes AI models to forget previously learned skills when acquiring new ones, effectively trading old knowledge for new.
📉 An example is Code Llama, which became expert in coding but lost its general knowledge of history and science after finetuning.

Introducing Continued Pre-training

🔑 Howard proposes continued pre-training as a revolutionary solution to overcome the limitations of finetuning.
🔄 Unlike staged finetuning, this method views the entire training process as a single, continuous flow from start to finish.
🎯 Key rules include combining all relevant data types (coding, general text, Q&A) from the outset and never discarding data to ensure knowledge retention.

Impact and Future Implications

📈 Continued pre-training helps overcome the alignment tax, allowing models to specialize without losing their general knowledge.
💡 This approach promises smarter and more stable AI models that retain broad capabilities while excelling in specific tasks.
🌐 It aims to make high-quality AI accessible even for developers with limited computing resources, aligning with Howard's original goal of AI democratization.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph26 entities · 15 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

26 entities

Chapters3 moments

Key Moments

Transcript25 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics12 themes

What’s Discussed

Artificial IntelligenceFinetuningJeremy HowardFast.aiULMFiTCatastrophic ForgettingContinued Pre-trainingLarge Language Models (LLMs)AI DemocratizationAlignment TaxData QualityComputing Power

Smart Objects26 · 15 links

Concepts· 18

People· 2

Products· 3

Events· 2

Media· 1

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free