NVIDIA released Nemotron 3 Ultra, a sparse mixture-of-experts reasoning model with roughly 550 billion total parameters and 55 billion active per token, distributed under the permissive NVIDIA Open Model License. Independent evaluation from Artificial Analysis places it at 47.7 on their Intelligence Index version 4.0, the strongest score for any US-built open-weights model to date, though it still trails the leading Chinese open releases such as DeepSeek V4 Pro at 51.5 and GLM-5.1 at 51.4, and sits well behind frontier proprietary systems like Claude Opus 4.8 at 61.4 and GPT-5.5 at 60.2. The headline pitch is not raw intelligence but the combination of openness, speed, and price: the model runs at about 140 output tokens per second and costs roughly fifty cents per million tokens blended, making it one of the cheapest and fastest entries near its capability tier.
The component benchmarks paint a model tuned for agentic and instruction-following work rather than deep knowledge. It scores 87 percent on GPQA Diamond, 81 percent on instruction-following on IFBench, which is second only to MiniMax-M3 across the entire field, 83 percent on the tau-squared Bench Telecom tool-use evaluation, and 67 percent on long-context reasoning. Its weaknesses are equally clear: it lands at an Omniscience knowledge index of negative one, reflecting low factual accuracy paired with a respectable seventy-one percent non-hallucination rate, just three percent on the CritPt physics-reasoning benchmark, and forty percent on SciCode. In other words, this is a model built to follow instructions, call tools, and run agent loops cheaply, not to win on graduate-level knowledge or scientific depth.
The agentic framing was reinforced the same day by the SGLang and Miles teams, who announced day-zero serving and reinforcement-learning support for Nemotron 3 Ultra, explicitly positioning it for long-running autonomous agents that plan, use tools, and operate over persistent workflows rather than single prompt-and-response turns. That matters because a fast, cheap, open model with strong tool-use numbers is exactly the substrate teams want for multi-step agent deployments where token volume and latency dominate cost. The release lands in a competitive moment for open weights, where the strongest open models have been predominantly Chinese, and NVIDIA is making a deliberate bid for American open-model leadership while also, not incidentally, showcasing a workload that sells its own hardware. The open question is whether the instruction-following and tool-use gains hold up outside curated benchmark harnesses, and whether the thin knowledge and physics scores limit it in the research and analysis settings where depth matters.
- Artificial Analysis frames it as the leading US open-weights model on intelligence, but emphasizes speed and price over raw capability, noting it still trails DeepSeek V4 Pro and GLM-5.1.
- LMSYS/SGLang frame the release around long-running autonomous agents, stressing day-zero inference plus reinforcement-learning training support rather than the model's benchmark standing.