Last Week in AI #243, recorded April 29 and posted May 4, is the most concentrated single artifact of the week, packing four distinct frontier-model events and an alignment-flavored incident into one show. Andrey Kurenkov and Jeremie Harris frame it as a coding-and-voice-heavy week followed by a fresh open-weights drop from China and a Tencent release that misses.
OpenAI shipped GPT-5.5, with the system card claiming meaningful gains on coding evaluations alongside higher per-token pricing than GPT-5.4. The card also surfaces chain-of-thought monitorability and misalignment testing as headline items — OpenAI continues to publish reasoning-trace probes against its own models — and includes a now-discussed system-prompt warning about "goblins" that the hosts treat as a quirk in OpenAI's deployment-time tooling rather than a serious capability claim. xAI countered with Grok Voice Think Fast 1.0, leading on real-time-voice-agent benchmarks and quantifying production impact at Starlink customer support and sales — large enough lifts that the hosts treat it as a credible Whisper-plus-frontier-model competitor rather than a demo, though the benchmarks are first-party.
The bigger frontier news is DeepSeek V4. Pro and Flash variants ship as open weights, with the architecture moving deeper into mixture-of-experts scaling and pushing context to one million tokens via hybrid compressed-attention modifications. The hosts read this as a continuation of the post-V3 cadence — the lab is converging on the same recipe that closed-weight labs are using internally and shipping the artifacts publicly. Tencent's Hunyuan 3 preview lands the same week with weaker benchmarks; Andrey treats it as evidence that the marginal value of "yet another Chinese frontier release" is dropping unless the lab can clearly differentiate on capability or efficiency.
The episode's safety-flavored thread is a sabotage incident that the hosts characterize as a deliberate attempt to insert harmful behavior into a frontier system — they cover what is publicly known so far without naming a perpetrator and treat it as evidence that supply-chain attacks against model-training pipelines are now a credible threat surface, not a hypothetical one. They tie this back to the "distillation attacks" framing that Nathan Lambert pushes back on in the same week, and to OpenAI's published chain-of-thought monitorability work — three threads converging on the same point that the misalignment frontier is increasingly about adversarial inputs to training rather than emergent goal-directed behavior in deployment.
For practitioners, the actionable signal is that DeepSeek V4 and Grok Voice Think Fast both land as serious challengers in their respective domains in the same week, GPT-5.5 raises the price-per-coding-task ceiling without obviously moving the floor, and the misalignment conversation is shifting from "will the model deceive" toward "will an adversary plant the deception." The full episode runs 1h52m and is worth listening to in full for context on each release.