State-Space Models (SSMs) and Mamba Explained: Efficient AI Architectures
Super Data Science: ML & AI Podcast with Jon KrohnNovember 16, 20256 min150 views
13 connectionsΒ·18 entities in this videoβUnderstanding State-Space Models (SSMs)
- π‘ State-space models are a family of models with origins dating back to the 1960s, used in fields like spaceflight control, population studies, and economic modeling.
- π§ The core concept involves mapping an input signal to a hidden state space using one set of equations, and then translating that state into an observable output with a second set of equations.
- π Over the last decade, SSMs have been investigated within the deep learning revolution to better model tasks without extensive classical feature engineering.
Evolution to Mamba and Hybrid Architectures
- π Researchers at Carnegie Mellon and Princeton have significantly advanced SSMs, leading to developments like Mamba and structured SSMs (S4, S6).
- π― Mamba specifically optimizes SSMs for the compute profiles required for language modeling tasks, incorporating mathematical insights into matrix properties.
- π§© Dell's Granite 4.0 release features a hybrid architecture combining Mamba layers (SSMs) with traditional transformers in a 9:1 ratio.
Performance and Efficiency Gains
- π SSMs, particularly in the Granite 4.0 models, offer linear context scaling and do not require positional embeddings, enabling processing of large contexts (validated up to 128k, theoretically beyond).
- π A key measurement shows eight sessions at 128k context on a 3 billion parameter model using approximately 15GB of memory, compared to 80GB for a pure transformer architecture, demonstrating significant context reduction.
- π‘ This efficiency allows for greater use of context in constrained devices, edge computing, RAG workflows, and multi-turn conversations.
Benchmarking and Capabilities
- β Granite models, including the 32 billion parameter versions, perform exceptionally well on benchmarks like if Eval for instruction following and structured output tasks.
- π They also rank highly on the Berkeley Function Calling leaderboard (BFCLV3), placing in the top five among frontier and larger models, showcasing strong performance relative to their size.
Knowledge graph18 entities Β· 13 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
18 entities
Chapters3 moments
Key Moments
Transcript22 segments
Full Transcript
Topics14 themes
Whatβs Discussed
State-Space ModelsSSMsMambaHybrid ArchitecturesTransformersLanguage ModelingDeep LearningAI EfficiencyContext WindowRAG WorkflowsInstruction FollowingFunction CallingDell AIGranite Models
Smart Objects18 Β· 13 links
ConceptsΒ· 10
ProductsΒ· 2
MediasΒ· 3
EventΒ· 1
CompaniesΒ· 2