OpenAI Codex-Spark: 15x Faster AI Code Generation with Cerebras WSE-3 Chip
[HPP] Simon WillisonFebruary 16, 20267 min
14 connectionsΒ·17 entities in this videoβIntroducing Codex-Spark
- π OpenAI has released GPT 5.3 Codex-Spark, an AI model that generates code significantly faster than previous versions.
- β‘ This new model achieves over 1,000 tokens per second, making it 15 times faster than the regular GPT 5.3 Codex.
- π‘ The primary goal of Spark is to enable developers to maintain a flow state by providing near-instant code output.
Cerebras Hardware Advantage
- π§ Codex-Spark runs on Cerebras Wafer Scale Engine (WSE-3) chips, marking OpenAI's first model on non-NVIDIA hardware.
- π¬ The WSE-3 chip is massive, measuring 46,225 square mm and being 57 times larger than NVIDIA's H100 GPU.
- πΎ It features 4 trillion transistors, 900,000 AI-optimized cores, and 44 GB of on-chip memory, providing 21 petabytes per second of memory bandwidth.
- π― This architecture keeps data on the chip, eliminating travel delays and enabling unprecedented token generation speeds.
Software Optimizations
- π OpenAI also re-engineered the communication protocol, switching from HTTP requests to persistent WebSocket connections.
- β This change alone reduced network overhead by 80% and decreased per-token overhead by 30% and first-token latency by 50%.
- π These software improvements, combined with the specialized hardware, contribute to the immediate code output experience.
Performance & Trade-offs
- βοΈ Codex-Spark is a smaller model than the full GPT 5.3 Codex, trading some reasoning power for speed.
- π οΈ It excels in tasks like prototyping, defining interfaces, and quick edits, where speed is paramount for maintaining developer flow.
- β οΈ However, it has limitations, including a 128,000 token context window and being text-only at launch.
- π Spark is not rated high capability for cybersecurity or biology tasks, meaning the full Codex model is recommended for security-sensitive code.
Future Implications
- π° OpenAI's $10 billion deal with Cerebras indicates a significant investment in specialized AI hardware.
- π This move suggests a shift in the AI hardware landscape, where specialized chips can outperform general-purpose GPUs for latency-sensitive workflows.
- π The WebSocket improvements are expected to become the default for all OpenAI models, promising faster responses across their entire ecosystem.
Knowledge graph17 entities Β· 14 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
17 entities
Chapters3 moments
Key Moments
Transcript28 segments
Full Transcript
Topics15 themes
Whatβs Discussed
OpenAICodex-SparkGPT-5.3 CodexCerebras WSE-3AI Code GenerationHardware AccelerationNVIDIAWafer Scale EngineTokens per secondWebSocketAI InferenceContext WindowCybersecuritySoftware EngineeringFlow State
Smart Objects17 Β· 14 links
CompaniesΒ· 4
ProductsΒ· 6
MediaΒ· 1
EventΒ· 1
ConceptsΒ· 5