Nathan Lambert's twenty-first Open Artifacts roundup compresses one of the densest months of open-weights releases on record into a single readthrough. Google shipped Gemma 4 in 1B / 4B / 12B / 27B sizes plus a 35B-A3B mixture-of-experts variant; DeepSeek released V4 in Flash and Pro tiers with both reasoning and non-reasoning modes (the previously dominant V3.2 has been retired from the API); Moonshot's Kimi K2.6 reasserts itself near the top of open Intelligence Index leaderboards; Xiaomi's MiMo 2.5 Pro lands its first credible benchmark showing; Zhipu's GLM-5.1 makes a quiet jump on agentic coding; Alibaba's Qwen3.6 Max Preview now sits in the same tier as proprietary frontier; and IBM, Mistral, and BigCode shipped smaller updates Lambert treats as housekeeping. The post's longest section is on the new Center for AI Standards and Innovation evaluation of DeepSeek V4 Pro. CAISI used nine benchmarks across cyber, biosecurity, chem, agentic coding, and reasoning, calibrated via Item Response Theory to produce one Elo rating per model. Their headline claim: the aggregate capability gap between the strongest US closed-frontier models (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro) and the strongest publicly released PRC models (DeepSeek V4 Pro, Kimi K2.6, MiMo 2.5 Pro) is now wider than at the V3 evaluation a year ago, despite each individual PRC release closing some axes. Lambert reads this as evidence that the closed labs are pulling away on agentic and long-horizon tasks specifically, while open weights remain competitive on contained reasoning benchmarks. Lambert pushes back, mildly, on CAISI's framing: the Elo aggregation hides that on several domains (long-context reasoning, coding, multilingual) the PRC frontier is within noise of the US frontier, and the gap is concentrated in the small set of evaluations CAISI weights most heavily, several of which are CAISI's own AA-Omniscience-style knowledge probes. He also notes that the Item Response Theory aggregation is the right call methodologically but visually compresses real differences. The piece closes with Lambert's own ranking: he places DeepSeek V4 Pro above Kimi K2.6 above MiMo 2.5 above GLM-5.1 above Gemma 4 27B for general use, with Kimi staying his pick for code and DeepSeek for long-context. The throughline of the post is that the open-weight ecosystem is now releasing at proprietary-lab cadence — five frontier-tier open models in one month — but that CAISI's data, however squinted at, shows the US closed-source labs widening their lead on the evaluations that map most directly to real economic deployment.