Adrian Kosowski on BDH: A Transformer-Replacing Architecture Inspired by the Brain
Super Data Science: ML & AI Podcast with Jon KrohnOctober 7, 20251h 12min264,319 views
39 connectionsΒ·40 entities in this videoβIntroducing Baby Dragon Hatchling (BDH)
- π Adrian Kosowski introduces BDH, a new architecture from Pathway, potentially a "transformer killer" that blends biological neuroscience with machine learning.
- π‘ BDH is inspired by the brain's structure and function, aiming to overcome limitations of current transformer models, particularly in reasoning and generalization.
- π The name "Baby Dragon Hatchling" (BDH) signifies a new, hatched stage of their "Baby Dragon" family of models, representing a significant conceptual advance.
Hebian Learning and Neural Inspiration
- π§ Hebian learning is a fundamental concept where neurons that fire together strengthen their connections, a principle influencing BDH's design.
- β³ The brain operates on various timescales, from rapid synaptic changes to long-term learning, a complexity BDH aims to model more effectively than transformers.
- π Historically, recurrent neural networks (RNNs) attempted to bridge neuroscience and ML, but transformers, while powerful, are harder to reconcile with biological processes.
Attention: Biological vs. Transformer
- π¬ In biological systems, attention operates at both system-wide (conscious focus) and micro (neuronal connections) levels, with neurons prioritizing neighbors.
- π Transformer attention, conversely, is often viewed as a context lookup and search mechanism, optimized for GPUs rather than biological plausibility.
- π€ BDH seeks to reconcile these views by implementing attention in a way that is more aligned with local neuronal communication and state-space modeling.
BDH: State Space, Sparsity, and Efficiency
- π§© BDH is a state space model, allowing attention to be viewed locally rather than just as a historical lookup, bridging RNN and transformer concepts.
- π‘ The architecture enables sparse activation, where only a small percentage of artificial neurons are active at any time, mirroring the brain's energy efficiency.
- β‘ This sparse activation leads to significant computational and energy efficiency, outperforming dense activation models like transformers, even at comparable scales.
Positive Spaces and Interpretability
- π¨ BDH operates in sparse positive spaces, contrasting with the dense vector spaces of transformers, which can be more akin to mixing colors than linear combinations.
- π§© This positivity and sparsity facilitate interpretability, potentially allowing for "grandmother neurons" or "grandmother synapses" that are directly responsible for specific concepts.
- π More important concepts are represented more compactly, with a tendency towards monosemanticity (single concept per unit), making the model's internal workings easier to understand.
Multilingualism and Future Potential
- π BDH's architecture allows for easier concatenation of models, enabling the creation of multilingual models by combining models trained on different languages.
- π The primary focus for BDH is on reasoning models that can handle complex, contextualized inputs and long-term learning, moving beyond current LLM limitations.
- π οΈ Pathway is releasing a simplified version of the BDH architecture publicly, encouraging experimentation and further development in areas like enterprise data processing and AI coding assistants.
Knowledge graph40 entities Β· 39 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
40 entities
Chapters19 moments
Key Moments
Transcript267 segments
Full Transcript
Topics15 themes
Whatβs Discussed
Dragon HatchlingBDH ArchitectureTransformer ModelsArtificial NeuronsHebian LearningNeuroscienceMachine LearningAttention MechanismState Space ModelsSparse ActivationPositive SpacesInterpretabilityMultilingual ModelsReasoning ModelsLarge Language Models
Smart Objects40 Β· 39 links
ProductsΒ· 7
PeopleΒ· 6
ConceptsΒ· 22
CompaniesΒ· 3
MediasΒ· 2