Sparse Activation: The Future of AI Inspired by the Human Brain

Super Data Science: ML & AI Podcast with Jon KrohnOctober 11, 202511 min254 views

11 connections·15 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

The Problem with Dense Activation in Transformers

💡 Current transformer architectures, like GPT-2, are densely activated, meaning all neurons or modules are engaged for every input.
⚡ This dense activation is computationally expensive and energy-intensive, making it unsustainable for systems with trillions of connections, like the human brain.
🧠 The human brain, in contrast, utilizes a sparsely activated system, where only a small fraction of neurons fire at any given time, as evidenced by fMRI and EEG studies.

Introducing Sparse Activation with BDH (Baby Dragon Hatchling)

🚀 BDH, or Baby Dragon Hatchling, demonstrates the power of sparse positive activations, with approximately 95% of its artificial neurons remaining silent at any moment.
🎯 This approach allows a billion-parameter model to rival the performance of densely activated models like GPT-2 on core cognitive tasks related to language and translation.
💰 The core principle of efficiency is central to BDH, enabling more with less computational and energy resources.

Technical Differences: Dense vs. Sparse Activation

🧩 Transformers operate in a world of dense activations, processing information through every neuron and connection for each input.
🧩 BDH, in the sparse world, activates only a small percentage of neurons, mimicking biological brain function.
⚙️ While transformers scale by adjusting parameters like attention heads and layers, a key constraint is the fixed vector space dimensionality of attention heads (around 1000 dimensions), potentially limiting nuanced reasoning.

Sparse Activation and Concept Composition

💬 In sparse positive spaces, concepts are composed more like a bag of concepts or a tag cloud, rather than through vector combinations.
🧩 This differs from vector spaces where concepts can be added, subtracted, or negated, allowing for opposites and negative vectors.
⚠️ Sparse activation does not rely on concepts of

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph15 entities · 11 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

15 entities

Chapters5 moments

Key Moments

Transcript42 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics13 themes

What’s Discussed

Sparse ActivationDense ActivationTransformer ArchitecturesArtificial NeuronsHuman BrainBDH (Baby Dragon Hatchling)Computational EfficiencyEnergy EfficiencyfMRIEEGLanguage ModelsVector SpacesConcept Composition

Smart Objects15 · 11 links

Concepts· 10

Person· 1

Medias· 3

Company· 1

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free