← Archive / All Digests
A wolf in round glasses reading a book, wrapped in a golden ribbon, in a sunlit forest.

Wolf Digest — Thursday, June 18, 2026

Coverage window: 2026-06-17 03:40 ET2026-06-18 03:03 ET
Press play to listen
Thursday, June 18, 2026
11m 14s · top-4 narrated briefing
#1 · AI for Science
OpenAI and Molecule.one run a near-autonomous AI chemist that improves a stubborn medicinal-chemistry reaction
OpenAI, working with Molecule.one, connected GPT-5.4 to “Maria,” an agentic chemistry system wired into a high-throughput automated laboratory, and gave it an open-ended goal: improve a useful but stubborn reaction class. The model generated and ranked thousands of research propo…
7.8 · 1 srcs
#2 · Government & Defense
Air Force awards first operational Collaborative Combat Aircraft production contracts to General Atomics and Anduril
The U.S. Air Force awarded General Atomics and Anduril production contracts for Increment 1 of its Collaborative Combat Aircraft program, the uncrewed “drone wingmen” designed to fly alongside crewed fighters. The two firms each received engineering-and-manufacturing-development…
7.6 · 3 srcs
#3 · Robotic Autonomy
AI2 releases MolmoMotion, a language-guided 3D motion-forecasting model, with a 1.16-million-video trajectory dataset
The Allen Institute for AI released MolmoMotion, a model for language-guided 3D motion forecasting, alongside what it says is the largest dataset of its kind. The framing is a distinction between perceiving motion and predicting it: modern models track how points and objects move…
7.5 · 2 srcs
6.5
#1
AI for Science 2026-06-17 OpenAI Research 7.8 8.4/8.1/6.9

OpenAI, working with Molecule.one, connected GPT-5.4 to “Maria,” an agentic chemistry system wired into a high-throughput automated laboratory, and gave it an open-ended goal: improve a useful but stubborn reaction class. The model generated and ranked thousands of research proposals; human chemists selected four to test. The standout, internally labeled OAI-M1-03, targeted Chan–Lam coupling — a carbon–nitrogen bond-forming reaction — for the historically low-yielding case of coupling primary sulfonamides with boronic acids. GPT-5.4 independently identified primary sulfonamides as the high-value substrate class and proposed that mild oxidants, in particular TEMPO, could raise yields.

Across two experimental cycles the Maria lab ran 10,080 reactions, more than a chemist running three a day would complete in a decade. Under the optimized conditions, measured yields improved for 88 percent of the boronic acids and 83 percent of the sulfonamides tested; mean yield rose from 16.6 percent to 25.2 percent, and the share of reactions clearing 30 percent yield went from 15.6 to 37.5 percent. A useful follow-up: the system found TEMPO could be swapped for a much cheaper analog, 4-hydroxy-TEMPO, with little performance loss. Crucially, the result survived the jump out of microliter screening — human chemists reproduced representative reactions at bench scale and saw higher yields for 11 of 14 substrate pairs, most more than doubling. Four external chemists reviewed the preprint and judged the finding novel.

OpenAI is careful to call this near-autonomous, not autonomous: humans wrote the steering and grading prompts, chose which proposals entered the lab, corrected experimental plans (the largest correction was avoiding DMSO as a solvent), handled consumables, and ran the bench validation. The whole effort took three months, from the first prompt on March 4 to sharing results on June 4. The work was scoped under OpenAI’s Preparedness Framework to a legitimate medicinal-chemistry problem, involved no toxins or weapons design, and used a model already evaluated with the UK AI Security Institute. The significance is the loop, not just the molecule — a frontier model proposed a surprising, specific, falsifiable hypothesis, designed and interpreted experiments, and arrived at a result human chemists could reproduce. The caveats are real and stated: a single reaction class, specialized infrastructure, and no proof yet that the method generalizes to other couplings or substrates.

AI for science drug discovery autonomous lab GPT-5.4
#2
Government & Defense 2026-06-17 DefenseScoopDefense OneShield AI 7.6 7.9/7.8/7.1

The U.S. Air Force awarded General Atomics and Anduril production contracts for Increment 1 of its Collaborative Combat Aircraft program, the uncrewed “drone wingmen” designed to fly alongside crewed fighters. The two firms each received engineering-and-manufacturing-development and production contracts for the CCA airframes, beating at least four other competitors, and the awards arrived four months ahead of schedule. They cover the first three production lots and put the service on a path to field at least 150 systems by the end of the decade, according to Col. Timothy Helfrich, the program acquisition executive for fighters and advanced aircraft.

Separately, the Air Force selected Anduril, Shield AI, and RTX subsidiary Collins Aerospace to advance in the program’s mission-autonomy software competition. Over the next year the service will evaluate the three vendors’ autonomy stacks and pick a primary provider in 2027. Helfrich described CCA as “the next evolution of air power” and the service’s first operational instance of human-machine teaming in aviation at this scale. The aircraft are meant to fly with the future F-47 and current fifth-generation platforms, extending their reach and sensing, and the modular drones are expected to be retrofittable as missions evolve. The program is positioned as a linchpin of the service’s Indo-Pacific modernization.

Two structural choices make the award notable beyond the dollar figures. First, splitting the airframe between an established defense prime in General Atomics and a software-first newcomer in Anduril signals the service hedging across two very different industrial models. Second, the autonomy-software competition is being run separately from the airframe, so the company that builds the body may not be the one that supplies the brain — Shield AI, which makes the Hivemind autonomy stack, separately announced an Air Force production contract for CCA mission autonomy. The structure makes the autonomy layer a contestable, swappable component rather than something locked to a single airframe vendor, which matters for how quickly the capability can iterate once the aircraft are flying.

How it was discussed
  • DefenseScoop stresses the award came four months early and funds at least 150 systems by decade's end.
  • Defense One frames the contracts as the first operational 'drone wingmen' to fly beside crewed fighters.
  • Shield AI's own release highlights its separate production contract for the CCA mission-autonomy (Hivemind) software.
CCA drone wingmen human-machine teaming autonomy
#3
Robotic Autonomy 2026-06-17 Allen Institute for AI (AI2)Hugging Face Blog 7.5 7.8/7.4/7.3

The Allen Institute for AI released MolmoMotion, a model for language-guided 3D motion forecasting, alongside what it says is the largest dataset of its kind. The framing is a distinction between perceiving motion and predicting it: modern models track how points and objects move through a video with high confidence, but that is retrospective — it explains motion that already happened. Many of the systems we want to build need to look forward instead. A robot reaching for a cup has to anticipate how the cup will move before it touches it; a video generator has to know what motion comes next to produce physically plausible frames.

MolmoMotion takes a single RGB frame, a set of 3D query points marked on an object, and a written instruction describing the intended action — for example, “move and rotate the wooden bowl with fruit on the table” — and predicts where those points will travel over the next few seconds in 3D space. The institute reports substantially stronger performance than existing forecasting methods, and positions the predicted trajectories as a drop-in signal for downstream uses: robotics planning and trajectory-conditioned video generation. Alongside the model, the team published MolmoMotion-1M, a collection of 3D point trajectories paired with action descriptions drawn from 1.16 million videos, plus PointMotion evaluation resources.

What makes the release interesting is that it treats motion forecasting as a language-conditioned 3D prediction problem rather than a purely visual one. The action description is what disambiguates which of many possible futures the model should predict, and that is exactly what lets the same scene feed both a manipulation planner and a controllable video generator. The dataset scale is the other lever: trajectory-level supervision at the million-video range is precisely what forward-prediction models have lacked. It fits a broader push toward models that don’t just describe the physical world but anticipate it — the same capability that robot policies and physically grounded video generators both depend on.

How it was discussed
  • AI2's tech report frames the contribution as forecasting, not perception — predicting future 3D point trajectories from a single frame plus an action description.
  • The Hugging Face Blog mirror emphasizes the open MolmoMotion-1M dataset (1.16M videos) as the enabling release.
3D motion forecasting VLA world models dataset
#4
Generative Media 2026-06-18 Latent Space (swyx & Alessio)Hacker NewsTwitter/X 7.5 7.1/6.9/8.5

Midjourney — the self-funded image-generation lab — used its launch event to unveil something far afield from generative media: a full-body ultrasonic CT scanner that founder David Holz called “the first new whole-body medical imaging modality in 50 years.” The device images the body with ultrasound rather than X-ray radiation or MRI magnets. The reported engineering is ambitious: roughly 358,000 ultrasonic elements across 40 ring-arranged systems in a 70-centimeter ring, the subject immersed in water because sound travels through water far better than air, data captured at around 17 gigabytes per second, reconstruction on 21 servers at a claimed two petaflops, and resolution of internal tissue down to about half a millimeter.

The current build is a Gen 1 prototype; about a dozen people have been scanned, each scan taking roughly 20 minutes, bottlenecked by bandwidth and reconstruction rather than physics. Holz stressed that the shown images do not yet use AI — but ultrasonic CT reconstruction is an inverse problem where learned denoising, super-resolution, and interpretation are the obvious next layers, and he framed the scanner as infrastructure to give AI fast, rich, cheap data about the physical body. The commercial wrapper is unusual: a 25,000-square-foot “Midjourney Spa” near Union Square in San Francisco, with saunas and cold plunges alongside nine or ten scanners, targeted for late 2027 and funded from Midjourney’s image-generation revenue with no investors. The long-term pitch is a fleet of 50,000 scanners enabling up to a billion scans a month and frequent, longitudinal body tracking.

The skepticism is equally clear and worth stating plainly. No sensitivity or specificity numbers, no disease-detection benchmarks, and no peer-reviewed validation were presented; the water-immersion form factor is a serious ergonomic constraint; current resolution is coarser than CT or MRI; and the path from “body composition” — the easier initial regulatory wedge Midjourney is discussing with the FDA — to clinical diagnosis is long and unproven. Frequent full-body scanning also raises the familiar overdiagnosis problem of surfacing many ambiguous findings. The engineering demo is real; the clinical, regulatory, and economic case is not yet.

How it was discussed
  • Latent Space's on-site writeup catalogs the hardware specs and Holz's 'day one of MRI' framing for the modality.
  • iScienceLuvr summarized the tradeoff: radiation-free, magnet-free, fast, low-cost, but water-immersion and coarser than CT/MRI.
  • Hacker News and imaging researchers flag the hard inverse-reconstruction problem and the lack of any clinical-validation numbers.
medical imaging ultrasound CT Midjourney hardware
#5
Industry 2026-06-17 Twitter/XLatent Space (swyx & Alessio) 7.3 6.6/7.4/7.9

Noam Shazeer announced he is joining OpenAI, leaving Google after what he called a difficult decision. Commentators framed it as one of the year's most consequential AI talent moves: Shazeer co-authored the original Transformer, T5, and Switch Transformer papers and pioneered sparse mixture-of-experts systems. Sam Altman said Shazeer was among the people he had most wanted to work with since OpenAI's founding. The move landed amid broader chatter about shifting competitive position, including a widely shared claim that OpenAI had overtaken Anthropic on valuation.

How it was discussed
  • Observers called it the most important AI talent move of the year given Shazeer's Transformer/MoE lineage.
  • Replies read it as much about disappointment at Google as pull from OpenAI.
talent OpenAI Google mixture-of-experts
#6
Generative Media 2026-06-17 TechCrunch — AI 7.0 6.7/6.8/7.5

Odyssey, a world-model startup founded by self-driving veterans Oliver Cameron (ex-Voyage, Cruise) and Jeff Hawke (ex-Wayve), raised a $310 million Series B at a $1.45 billion valuation led by Natural Capital, with Amazon, AMD Ventures, and GV participating. Odyssey builds world models that gather physical-world data — it has people walk around with backpack-mounted cameras, Google Street View style — and simulate it with accurate physics, and is best known for generating rich, interactive video from text prompts. As part of the round, AWS becomes Odyssey's preferred cloud and the startup will optimize its models for AWS Trainium chips, a competitor to Nvidia's accelerators.

world models funding interactive video Trainium
#7
AI Coding 2026-06-17 arXivHugging Face Daily Papers 6.9 7.1/6.8/6.8

LoopCoder-v2 is a family of 7B parallel-loop Transformer (PLT) coders trained from scratch on 18T tokens, studying loop-count selection under a gain–cost view: extra loops refine representations but cross-loop position offsets introduce a positional mismatch. The two-loop variant broadly beats the non-looped baseline across code generation, code reasoning, and agentic software engineering, improving SWE-bench Verified from 43.0 to 64.4 and Multi-SWE from 14.0 to 31.0. PLT uses cross-loop position offsets and shared-KV gated sliding-window attention so loop count becomes a practical test-time-compute knob.

looped transformers test-time compute SWE-bench
#8
Multimodal 2026-06-17 arXivHugging Face Daily Papers 6.6 6.7/6.5/6.6

OmniAgent reframes long-video understanding as active perception rather than the uniform “watch-it-all” paradigm, whose cost grows with video duration. It is presented as the first native omni-modal agent that decides what to attend to instead of relying on global pre-scanning, keeping context cost from scaling with video length and improving accuracy on long-video benchmarks.

video understanding active perception omni-modal
#9
Government & Defense 2026-06-17 DefenseScoop 6.5 6.2/6.7/6.6

The Marine Corps activated Marine Unmanned Maintenance Squadron 14 (MUMS-14) at MCAS Cherry Point, its first organic unit dedicated to sustaining large drones, specifically the MQ-9A Reaper fleet. The roughly 300-person squadron — mostly UAV technicians, mechanics, and ground-control-station maintainers — replaces the contracted logistics support that forward-deployed Reaper detachments had relied on. The activation lands alongside a bipartisan right-to-repair provision, championed by Senators Warren and Sheehy, included in the Senate Armed Services Committee's 2027 NDAA markup that would let troops fix their own equipment without contractor lock-in.

MQ-9A Reaper right to repair Marine Corps sustainment
#10
Reinforcement Learning 2026-06-17 arXivHugging Face Daily Papers 6.5 6.6/6.5/6.4

RLVR methods like GRPO commonly suffer policy-entropy collapse during training. A first-order gradient analysis of token-level entropy dynamics identifies a token-level credit-assignment mismatch; STARE reweights advantages at the token level using surprisal to stabilize entropy, preserving exploration without the usual collapse during reasoning-model post-training.

RLVR GRPO entropy collapse
#11
Post-Training 2026-06-17 arXivHugging Face Daily Papers 6.5 6.5/6.6/6.4

Post-training of reasoning models leans on distillation from chain-of-thought annotations that are costly and often noisy, and on RLVR with sparse signals. This work conditions self-distillation on rubrics to provide denser, more reliable supervision than imperfect rationales, aiming to combine distillation's stability with reward-aligned correctness.

self-distillation rubrics post-training
#12
AI Coding 2026-06-17 Twitter/XGitHub Blog — AI & ML 6.4 6.3/6.3/6.5

GitHub Copilot's Auto mode now routes requests through a custom model that chooses among available LLMs based on reasoning depth, code complexity, debugging difficulty, and tool-orchestration needs, per a GitHub blog post and an accompanying research paper. The pitch is getting more out of each token by matching task to model rather than defaulting to one, part of a broader move toward learned model-routing inside coding agents.

Copilot model routing coding agents
#13
Reinforcement Learning 2026-06-17 arXivHugging Face Daily Papers 6.4 6.5/6.4/6.3

Preference-based RL learns reward models from pairwise behavior comparisons but typically uses passive data collection with poor early sample efficiency. UBP2 is a model-based approach that actively directs exploration by jointly reasoning over uncertainty in the reward model and the dynamics, improving sample efficiency on control and robotics tasks.

preference-based RL exploration robotics
#14
Robotic Autonomy 2026-06-17 arXivHugging Face Daily Papers 6.4 6.5/6.3/6.4

OneCanvas gives VLMs spatial reasoning without model-specific geometry encoders or large training budgets. It unprojects each patch to 3D world coordinates using depth and camera pose, then aggregates all views onto a single equirectangular panoramic canvas, a lightweight representation for 3D scene understanding usable in embodied settings.

3D scene understanding VLM panoramic
#15
Evaluations & Benchmarks 2026-06-17 arXivHugging Face Daily Papers 6.4 6.4/6.5/6.3

TherapeuticsBench Preclinical Pharmacology (TxBench-PP) is a verifiable benchmark for small-molecule preclinical pharmacology, the first slice of a broader TherapeuticsBench effort. It targets realistic drug-program decisions to test whether AI agents can be trusted to compress interpretation-and-decision loops in drug discovery, with grounded, checkable evaluation.

drug discovery benchmark agents
#16
Frontier LLMs 2026-06-17 arXivHugging Face Daily Papers 6.4 6.5/6.4/6.3

Uniform diffusion language models (UDLMs) let any token update at any step, in principle enabling more flexible generation than masked diffusion or autoregressive decoding, but none had been pretrained from scratch at both large parameter and token scale. Sumi is an open UDLM trained at scale, providing a comparison point for diffusion-based LM generation against established autoregressive and masked-diffusion models.

diffusion LM open model generation
#17
Evaluations & Benchmarks 2026-06-17 arXivHugging Face Daily Papers 6.3 6.3/6.4/6.2

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. RNG-Bench (Reconstructive Non-Markov Games) isolates a base model's ability to maintain and act on hidden state, separating it from full-state exposure and post-episode recall that existing benchmarks conflate.

non-Markov multimodal benchmark
#18
Reinforcement Learning 2026-06-17 arXivHugging Face Daily Papers 6.3 6.4/6.3/6.2

Standard RLVR samples responses independently and optimizes from final answers, causing redundant exploration of similar intermediate reasoning and wasting compute under sparse rewards. GraphPO structures responses as a graph to share intermediate reasoning and densify credit assignment, improving the efficiency of policy optimization for large reasoning models.

RLVR policy optimization reasoning
#19
Reinforcement Learning 2026-06-17 arXivHugging Face Daily Papers 6.3 6.3/6.4/6.2

Long-context reasoning is essential for agentic LLMs reasoning over lengthy trajectories. Arguing that prior RL work overfocuses on reward engineering while diverse training data stays scarce, this paper revisits long-context RL from a data-centric angle, proposing a recipe for assembling training data that improves long-horizon reasoning.

long context RL data-centric
#20
Multimodal 2026-06-17 arXivHugging Face Daily Papers 6.3 6.3/6.3/6.2

On-policy self-distillation trains a model on its own rollouts with a frozen copy giving dense token-level targets, but extending it to multimodal models opens a shortcut: the privileged target can guide tokens from the text reference rather than the image. ViGOS forces visual grounding so the dense targets depend on the image, decoupling perception from text-driven reasoning shortcuts.

multimodal self-distillation visual grounding
#21
AI for Science 2026-06-17 arXivHugging Face Daily Papers 6.3 6.4/6.4/6.1

LLMs reason well over symbols but are blind to quantum representations such as unitary matrices. This work maps unitary operators into an LLM's latent space, enabling unified modeling over quantum and linguistic inputs and a step toward letting language models reason about quantum operators directly.

quantum LLM representation
#22
Robotic Autonomy 2026-06-17 arXivHugging Face Daily Papers 6.3 6.3/6.4/6.2

Embodied VLA models are made by fine-tuning pretrained VLMs on robotics data, but it is unclear how much commonsense and factual knowledge survives. Failures on knowledge-sensitive tasks conflate missing knowledge with poor low-level control. Act2Answer is a lightweight protocol adapting VLM knowledge benchmarks to VLAs, isolating retained knowledge from control generalization.

VLA world knowledge robotics
#23
Interpretability 2026-06-17 arXivHugging Face Daily Papers 6.3 6.4/6.4/6.1

A study of sparse-autoencoder feature interventions finds they are unreliable: after a feature is suppressed, downstream computation partially recovers the suppressed information, undermining causal claims about SAE features as clean control knobs. The result is a caution for interpretability work that treats SAE interventions as faithful, surgical edits to model behavior.

SAE interpretability interventions
#24
Generative Media 2026-06-17 Twitter/Xfal 6.2 6.0/6.0/6.6

Inference platform Fal rolled out Kling 3.0 Turbo and related Omni upgrades: faster generation, lower cost, better lip-sync, more stable motion, stronger prompt and reference consistency, clips up to 15 seconds, and full 4K generation with Omni, plus improved storyboard and multishot workflows. The update is an incremental but practical push on the cost-quality-speed frontier for text- and image-to-video generation.

Kling text-to-video 4K
#25
Audio & Speech 2026-06-17 arXivHugging Face Daily Papers 6.2 6.2/6.2/6.2

Fixed spike encoders force downstream spiking neural networks to compensate for non-adaptive inputs, a bottleneck for neuromorphic speech. This work presents a learnable residual speech-to-spike encoder trained end-to-end with a recurrent SNN backbone, adapting the encoding to the acoustic signal and improving event-driven speech processing.

spiking neural networks speech neuromorphic
#26
Reinforcement Learning 2026-06-17 arXivHugging Face Daily Papers 6.2 6.2/6.3/6.1

Simulating human users could advance agent-assistant training, personalization evaluation, and social-science research. Rather than matching a single ground-truth response by log-probability or similarity, Turing-RL uses a Turing-test-based reward so the simulator is trained to be indistinguishable from real users, capturing the diversity of plausible human behavior.

user simulation RL evaluation
#27
Generative Media 2026-06-17 arXivHugging Face Daily Papers 6.2 6.2/6.2/6.1

Score- and flow-matching generators often lean on preference-based RL both to align with subjective preferences and, oddly, to recover realism and coherent object structure that matching training should already learn. Arguing this reflects a structural mismatch, the paper corrects flow matching with discriminators to recover those properties directly from data rather than via preference optimization.

flow matching discriminators generative
#28
Recurrent & Linear Attention 2026-06-17 arXivHugging Face Daily Papers 6.1 6.1/6.2/6.0

Hybrid architectures interleave softmax attention with cheaper linear or recurrent mixers to cut quadratic cost. This work re-examines what the efficient-attention component actually contributes in such hybrids, clarifying where linear-attention layers help and where full attention remains necessary for quality at long context.

hybrid architecture linear attention efficiency
#29
Efficiency 2026-06-17 arXivHugging Face Daily Papers 6.1 6.1/6.0/6.1

Industrial 10B-level foundation models set the bar for image inpainting but are too costly to deploy. Moebius is a 0.2B lightweight inpainting framework that targets 10B-level performance by addressing the representation bottleneck that extreme structural compression usually triggers, offering a deployable specialist alternative to giant generalists.

inpainting efficient model distillation
#30
Infrastructure 2026-06-17 Cohere Blog 6.0 6.1/6.0/5.9

Cohere published an engineering deep-dive on fair scheduling for multi-tenant LLM inference, aimed at preventing a single heavy tenant from monopolizing shared GPU compute and starving others — the classic “noisy neighbour” problem. The post describes mechanisms to give each tenant its fair share of serving capacity under contention, a practical concern as inference platforms pack many workloads onto shared accelerators.

inference multi-tenant serving fairness
#31
Infrastructure 2026-06-17 LMSYS Blog (Chatbot Arena) 6.0 6.2/6.1/5.7

The SGLang-JAX team added efficient serving of inclusionAI's Ling-2.6-1T (a trillion-parameter MoE) on TPU v7x. Profiling pinned the mixture-of-experts path as the main bottleneck; the team hides MoE data movement behind compute using a single Pallas kernel, a concrete TPU-serving optimization for very large sparse models.

SGLang TPU MoE Pallas
#32
Agents & Tool Use 2026-06-17 arXivHugging Face Daily Papers 6.0 6.0/6.1/5.9

Most agent memory benchmarks assume a single user, leaving shared assistants for hospitals, workplaces, and households understudied. In those settings many principals write to a common memory pool and query it under different roles and scopes, so memory quality requires governance, not just recall. GateMem benchmarks multi-principal shared-memory agents on access control and role-scoped retrieval.

agent memory governance benchmark
#33
Industry 2026-06-17 Anthropic News 5.9 5.5/6.1/6.1

Anthropic announced a Seoul office and a set of partnerships across the Korean AI ecosystem, extending its Asia-Pacific enterprise and public-sector footprint. The move continues Anthropic's international expansion and regional go-to-market buildout following recent partner-network and enterprise-integration announcements.

Anthropic Korea expansion
#34
Audio & Speech 2026-06-17 LMSYS Blog (Chatbot Arena) 5.9 5.8/5.9/6.0

LMSYS, with MOSI and the OpenMOSS team, announced end-to-end serving for MOSS-TTS-Local-Transformer-v1.5 on SGLang-Omni. The open text-to-speech model produces native-streaming 48 kHz audio, targeting low-latency conversational voice agents on an open serving stack.

TTS streaming 48kHz open model
Items
34
Multi-source
27
Long-form (≥7.5)
4
Sources OK / attempted
113 / 119
Top category
Reinforcement Learning
5 items