Adrian Kosowski on BDH: A Transformer-Replacing Architecture Inspired by the Brain

Super Data Science: ML & AI Podcast with Jon KrohnOctober 7, 20251h 12min264,319 views

39 connections·40 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Introducing Baby Dragon Hatchling (BDH)

🚀 Adrian Kosowski introduces BDH, a new architecture from Pathway, potentially a "transformer killer" that blends biological neuroscience with machine learning.
💡 BDH is inspired by the brain's structure and function, aiming to overcome limitations of current transformer models, particularly in reasoning and generalization.
🐉 The name "Baby Dragon Hatchling" (BDH) signifies a new, hatched stage of their "Baby Dragon" family of models, representing a significant conceptual advance.

Hebian Learning and Neural Inspiration

🧠 Hebian learning is a fundamental concept where neurons that fire together strengthen their connections, a principle influencing BDH's design.
⏳ The brain operates on various timescales, from rapid synaptic changes to long-term learning, a complexity BDH aims to model more effectively than transformers.
🔌 Historically, recurrent neural networks (RNNs) attempted to bridge neuroscience and ML, but transformers, while powerful, are harder to reconcile with biological processes.

Attention: Biological vs. Transformer

💬 In biological systems, attention operates at both system-wide (conscious focus) and micro (neuronal connections) levels, with neurons prioritizing neighbors.
🌐 Transformer attention, conversely, is often viewed as a context lookup and search mechanism, optimized for GPUs rather than biological plausibility.
🤝 BDH seeks to reconcile these views by implementing attention in a way that is more aligned with local neuronal communication and state-space modeling.

BDH: State Space, Sparsity, and Efficiency

🧩 BDH is a state space model, allowing attention to be viewed locally rather than just as a historical lookup, bridging RNN and transformer concepts.
💡 The architecture enables sparse activation, where only a small percentage of artificial neurons are active at any time, mirroring the brain's energy efficiency.
⚡ This sparse activation leads to significant computational and energy efficiency, outperforming dense activation models like transformers, even at comparable scales.

Positive Spaces and Interpretability

🎨 BDH operates in sparse positive spaces, contrasting with the dense vector spaces of transformers, which can be more akin to mixing colors than linear combinations.
🧩 This positivity and sparsity facilitate interpretability, potentially allowing for "grandmother neurons" or "grandmother synapses" that are directly responsible for specific concepts.
📊 More important concepts are represented more compactly, with a tendency towards monosemanticity (single concept per unit), making the model's internal workings easier to understand.

Multilingualism and Future Potential

🌍 BDH's architecture allows for easier concatenation of models, enabling the creation of multilingual models by combining models trained on different languages.
🚀 The primary focus for BDH is on reasoning models that can handle complex, contextualized inputs and long-term learning, moving beyond current LLM limitations.
🛠️ Pathway is releasing a simplified version of the BDH architecture publicly, encouraging experimentation and further development in areas like enterprise data processing and AI coding assistants.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph40 entities · 39 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

40 entities

Chapters19 moments

Key Moments

Transcript267 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

Dragon HatchlingBDH ArchitectureTransformer ModelsArtificial NeuronsHebian LearningNeuroscienceMachine LearningAttention MechanismState Space ModelsSparse ActivationPositive SpacesInterpretabilityMultilingual ModelsReasoning ModelsLarge Language Models

Smart Objects40 · 39 links

Products· 7

People· 6

Concepts· 22

Companies· 3

Medias· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free