OpenAI Codex-Spark: 15x Faster AI Code Generation with Cerebras WSE-3 Chip

[HPP] Simon WillisonFebruary 16, 20267 min

14 connections·17 entities in this video→

Capture as you watch

Save any video to veridive in one click.

The free veridive Chrome extension pulls the transcript from any YouTube video or podcast you're watching — ready to ask, cite, and connect.

Introducing Codex-Spark

🚀 OpenAI has released GPT 5.3 Codex-Spark, an AI model that generates code significantly faster than previous versions.
⚡ This new model achieves over 1,000 tokens per second, making it 15 times faster than the regular GPT 5.3 Codex.
💡 The primary goal of Spark is to enable developers to maintain a flow state by providing near-instant code output.

Cerebras Hardware Advantage

🧠 Codex-Spark runs on Cerebras Wafer Scale Engine (WSE-3) chips, marking OpenAI's first model on non-NVIDIA hardware.
🔬 The WSE-3 chip is massive, measuring 46,225 square mm and being 57 times larger than NVIDIA's H100 GPU.
💾 It features 4 trillion transistors, 900,000 AI-optimized cores, and 44 GB of on-chip memory, providing 21 petabytes per second of memory bandwidth.
🎯 This architecture keeps data on the chip, eliminating travel delays and enabling unprecedented token generation speeds.

Software Optimizations

🌐 OpenAI also re-engineered the communication protocol, switching from HTTP requests to persistent WebSocket connections.
✅ This change alone reduced network overhead by 80% and decreased per-token overhead by 30% and first-token latency by 50%.
📈 These software improvements, combined with the specialized hardware, contribute to the immediate code output experience.

Performance & Trade-offs

⚖️ Codex-Spark is a smaller model than the full GPT 5.3 Codex, trading some reasoning power for speed.
🛠️ It excels in tasks like prototyping, defining interfaces, and quick edits, where speed is paramount for maintaining developer flow.
⚠️ However, it has limitations, including a 128,000 token context window and being text-only at launch.
🔒 Spark is not rated high capability for cybersecurity or biology tasks, meaning the full Codex model is recommended for security-sensitive code.

Future Implications

💰 OpenAI's $10 billion deal with Cerebras indicates a significant investment in specialized AI hardware.
🚀 This move suggests a shift in the AI hardware landscape, where specialized chips can outperform general-purpose GPUs for latency-sensitive workflows.
🌍 The WebSocket improvements are expected to become the default for all OpenAI models, promising faster responses across their entire ecosystem.

Ask, don't scrub

Discover the spoken web.

veridive answers questions with exact timestamps and citations — across every podcast, video, and article you've saved.

Knowledge graph17 entities · 14 connections

How they connect

An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.

Hover · drag to explore

17 entities

Chapters3 moments

Key Moments

Transcript28 segments

Full Transcript

Follow the thread

Find every place these ideas show up.

veridive maps the same people, claims, and topics across thousands of sources — so you can trace an idea from one conversation to the next.

Topics15 themes

What’s Discussed

OpenAICodex-SparkGPT-5.3 CodexCerebras WSE-3AI Code GenerationHardware AccelerationNVIDIAWafer Scale EngineTokens per secondWebSocketAI InferenceContext WindowCybersecuritySoftware EngineeringFlow State

Smart Objects17 · 14 links

Companies· 4

Products· 6

Media· 1

Event· 1

Concepts· 5

Hours of content, seconds to the answer.

Save what you listen to. Ask it anything. Watch the threads between sources surface on their own.

Get started free