Introducing EmbeddingGemma: On-Device Text Embeddings for Generative AI

Google for DevelopersSeptember 5, 20254 min116,739 views

7 connections·8 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

EmbeddingGemma: State-of-the-Art On-Device Embeddings

💡 EmbeddingGemma is a new 300 million parameter text embedding model designed for mobile-first AI and generative AI experiences directly on user hardware.
🧠 Embeddings are numerical representations of text, transforming data into vectors that generative models can use for downstream tasks.
⚡ The model is small, fast, and efficient, capable of running with as little as 300 megabytes of RAM due to quantization-aware training, while preserving state-of-the-art quality.

Key Features and Capabilities

🎯 EmbeddingGemma generates embeddings of 768 dimensions but supports customization down to 128 dimensions using Matryoshka Representation Learning (MRL).
🚀 It is based on the same technology as Gemini embedding models, offering high-quality semantic search, fast information retrieval, and customized classification/clustering.
🏆 The model achieves the best score on the massive text embedding benchmark for models under 500 million parameters and is trained across 100+ languages.

On-Device Performance and Privacy

🔒 Engineered for on-device performance, EmbeddingGemma ensures efficient computations and minimal memory footprint, even on resource-constrained hardware.
🛡️ It facilitates on-device embedding of local documents, ensuring sensitive user data never leaves the device.
🌐 Offline functionality means search and retrieval features work regardless of internet connectivity.

Building Generative AI Experiences

🧩 Together with generative models like Gemma 3N, EmbeddingGemma enables powerful mobile-first generative AI experiences and efficient Retrieval Augmented Generation (RAG) pipelines.
💬 Applications can leverage user context from data for more personalized responses, such as understanding a user's need for a carpenter based on context.
🌐 An example demo shows a user querying previously opened articles or web pages using a browser extension that utilizes EmbeddingGemma for on-device embedding and retrieval.

Customization and Accessibility

🛠️ EmbeddingGemma is designed for customization, allowing users to fine-tune it for their specific domain or language.
🤝 It works across popular platforms like Hugging Face and Kaggle, with example notebooks available in the Gemma cookbook.
✨ This next generation of on-device embedding models is open for everyone, offering a small, fast, and efficient solution for developers.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph8 entities · 7 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

8 entities

Chapters2 moments

Key Moments

Transcript15 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics14 themes

What’s Discussed

EmbeddingGemmaText EmbeddingsGenerative AIOn-Device AIMobile AIQuantizationMatryoshka Representation Learning (MRL)Retrieval Augmented Generation (RAG)Semantic SearchInformation RetrievalCustom ClassificationOffline AIGoogle AIGemma

Smart Objects8 · 7 links

Products· 2

Concepts· 2

Medias· 2

Companies· 2

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free