Cerebras closed its first trading day at $280, putting the company at a $60 billion market cap and minting the largest AI-hardware IPO of the cycle so far. The debut comes after one pulled S-1, the 750 MW OpenAI partnership announced earlier this year, and the roughly $10–$20 billion equity-and-supply deal Reuters confirmed in April. AINews frames the listing as a leading indicator for what it calls the "inference inflection" — the multi-quarter shift in spend from training clusters toward dedicated inference systems — and notes the IPO arrives just six months after NVIDIA's $20 billion execuhire of Groq, which had already pulled the same architectural conversation into the mainstream.
The technical case AINews lays out around the financial event is the wafer-scale story Cerebras has been telling for half a decade, only now with revenue and a public balance sheet behind it. The CS-3 system replaces the traditional pattern of stitching together thousands of small dies over PCIe or NVLink with a single 46,225 mm² wafer carrying 900,000 cores and 44 GB of on-wafer SRAM at roughly 21 PB/s of bandwidth. For inference workloads where weight reuse and KV-cache traffic dominate, this collapses what would otherwise be tens or hundreds of GPU-to-GPU hops per token onto silicon. Cerebras's published numbers on Llama 3.1 405B and Qwen3-235B claim 5–10× higher output tokens-per-second than the best GPU stacks at comparable cost, which is the operational metric serving providers like Hugging Face's hosted endpoints, OpenAI's lower-tier API tiers, and frontier customers building real-time agentic loops actually pay for.
The market reaction lands inside a broader rotation. NVIDIA, AMD, Groq-via-NVIDIA, the SambaNova–QuantumScape-style mergers earlier this year, and now Cerebras together describe an inference market that institutional investors have decided is structurally large enough to support multiple architectures rather than collapse to a CUDA monopoly. The Decade-of-Cerebras chart making the rounds (Amir Efrati) traces the company from the 2015 founding through the 2019 first wafer, the long stretch where its order book was effectively limited to national labs and Mubadala-funded supercomputer builds, and the inflection in 2024–2025 when OpenAI, Meta, and a half-dozen sovereign-AI customers began locking in multi-gigawatt commitments.
What to watch from here: whether the IPO proceeds get steered toward a North American fab build to reduce TSMC concentration risk, how much of the OpenAI volume Cerebras can fulfill against the existing GB300 and MI355X commitments those same customers signed in late 2025, and whether the inference-tokens-per-dollar curve the wafer architecture promises actually translates to publicly verifiable Artificial Analysis price/speed numbers once the post-IPO disclosure cadence kicks in. The DeepSeek V4 Pro line on AA's leaderboard sits at 33 output tokens-per-second versus 260 for gpt-oss-120B on Cerebras-class hardware, which is the kind of gap that determines whether the inference inflection becomes a category fact rather than a quarter's story.