State-Space Models (SSMs) and Mamba Explained: Efficient AI Architectures

Super Data Science: ML & AI Podcast with Jon KrohnNovember 16, 20256 min150 views

13 connections·18 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Understanding State-Space Models (SSMs)

💡 State-space models are a family of models with origins dating back to the 1960s, used in fields like spaceflight control, population studies, and economic modeling.
🧠 The core concept involves mapping an input signal to a hidden state space using one set of equations, and then translating that state into an observable output with a second set of equations.
🚀 Over the last decade, SSMs have been investigated within the deep learning revolution to better model tasks without extensive classical feature engineering.

Evolution to Mamba and Hybrid Architectures

🐍 Researchers at Carnegie Mellon and Princeton have significantly advanced SSMs, leading to developments like Mamba and structured SSMs (S4, S6).
🎯 Mamba specifically optimizes SSMs for the compute profiles required for language modeling tasks, incorporating mathematical insights into matrix properties.
🧩 Dell's Granite 4.0 release features a hybrid architecture combining Mamba layers (SSMs) with traditional transformers in a 9:1 ratio.

Performance and Efficiency Gains

📈 SSMs, particularly in the Granite 4.0 models, offer linear context scaling and do not require positional embeddings, enabling processing of large contexts (validated up to 128k, theoretically beyond).
📊 A key measurement shows eight sessions at 128k context on a 3 billion parameter model using approximately 15GB of memory, compared to 80GB for a pure transformer architecture, demonstrating significant context reduction.
💡 This efficiency allows for greater use of context in constrained devices, edge computing, RAG workflows, and multi-turn conversations.

Benchmarking and Capabilities

✅ Granite models, including the 32 billion parameter versions, perform exceptionally well on benchmarks like if Eval for instruction following and structured output tasks.
🏆 They also rank highly on the Berkeley Function Calling leaderboard (BFCLV3), placing in the top five among frontier and larger models, showcasing strong performance relative to their size.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph18 entities · 13 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

18 entities

Chapters3 moments

Key Moments

Transcript22 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics14 themes

What’s Discussed

State-Space ModelsSSMsMambaHybrid ArchitecturesTransformersLanguage ModelingDeep LearningAI EfficiencyContext WindowRAG WorkflowsInstruction FollowingFunction CallingDell AIGranite Models

Smart Objects18 · 13 links

Concepts· 10

Products· 2

Medias· 3

Event· 1

Companies· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free