Wolf Digest — 2026-05-25

#1

White House nears Anthropic deal letting NSA and U.S. spy agencies use Claude for classified work

Government & Defense 2026-05-23 The Information — AINew York Times 7.8 7.5/8.5/7.5

The White House is closing in on an arrangement that would clear the National Security Agency and other elements of the U.S. intelligence community to put Anthropic's frontier Claude models against classified work, according to a New York Times report surfaced this weekend by The Information. The deal is the most concrete acknowledgement to date that the IC, which has historically rolled its own siloed analytic tools and been reluctant to depend on commercial general-purpose models for anything touching SCI-level data, now sees the cost of staying off the modern post-training frontier as larger than the cost of bringing a vendor inside its enclaves. Coverage to date has not specified which model tier (Claude Opus, Sonnet, Haiku, or a fine-tuned variant), which enclave (AWS Top Secret / IL6, an on-prem deployment, or air-gapped weights), or the dollar figure attached, but the framing is operational rather than experimental.

The arrangement is notable for proceeding despite an unresolved overhang on the DoD side. Earlier this year the Pentagon designated Anthropic a "supply chain risk" following a contract-terms dispute, and that designation has not been publicly retracted. The White House's willingness to move forward anyway, on a separate authority and through a different acquisition pathway, suggests the executive branch is treating IC adoption of frontier reasoning models as urgent enough to route around an open DoD-vendor disagreement rather than wait for it to settle. It also lines up with a broader pattern of 2026 announcements: OpenAI's ChatGPT Gov rollout, the Anthropic-Palantir-AWS partnership for defense workloads from last year, and the steady accumulation of FedRAMP High and IL5/IL6 accreditations across the labs.

For Anthropic specifically, the news consolidates a posture that has been visible for several quarters: the lab markets responsible-scaling commitments to its alignment-focused researcher base while quietly building one of the most aggressive government-sales motions of any frontier lab, with Claude Gov, the Anduril partnership, and now classified-IC access. The substantive open question is what use cases the model is actually approved for at the post-training and deployment layer — analytic triage of open-source intelligence, structured extraction from collected SIGINT, code generation for IC tooling, and agent-style workflows in the analytic enclaves are all very different risk surfaces, and the public reporting does not distinguish them. Expect the next round of reporting to focus on classification-aware fine-tuning, on whether the deployment is read-only or includes agentic tool use, and on which oversight body (ODNI, the IC Inspector General, or a new construct) signs off on monitoring, since each of those decisions will shape how this generalizes across the rest of the IC.

How it was discussed

The Information lede emphasizes the deal advancing despite the DoD's earlier 'supply chain risk' designation on Anthropic.
The New York Times original reporting frames this as part of a broader spy-agency push for advanced AI access amid chip-supply constraints.

anthropic nsa intelligence-community classified-ai claude-gov

#2

SkillOpt: text-space optimizer for agent skills lifts no-skill GPT-5.5 accuracy by +23.5 points

Agents & Tool Use 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 7.5 7.5/7.0/8.0

SkillOpt reframes the now-ubiquitous "agent skills" pattern — short procedural documents that a frozen agent loads into context to specialize for a domain — as a text-space optimization problem with the same discipline applied to weight-space training. The authors argue that the dominant approaches today (hand-crafting, one-shot LLM generation, or loosely supervised self-revision) fail to reliably improve over their starting point under feedback, behaving more like prompt engineering than optimization. Their fix: a separate optimizer model that converts scored rollouts into bounded add, delete, or replace edits on a single skill document, with each candidate edit gated by whether it strictly improves a held-out validation score.

The mechanics borrow heavily from gradient-based training. A textual learning-rate budget caps the magnitude of edits per step, a rejected-edit buffer prevents the optimizer from re-proposing failed changes, and an epoch-wise slow / meta update keeps the skill from drifting. Critically, none of this adds inference-time model calls at deployment — the optimized skill is a static text artifact, just a better one than what humans or one-shot generation produce. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, and Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells, and beats every per-cell competitor among human-written, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills.

The headline numbers are concrete. On GPT-5.5 the average no-skill baseline gains +23.5 points in direct chat, +24.8 inside the Codex agentic loop, and +19.1 inside Claude Code. Transfer experiments are arguably more important than the within-cell results: optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and onto a nearby math benchmark without re-optimization. That portability is the property that determines whether agent-skill optimization is a one-off trick or a real artifact-creation paradigm.

Why it matters: the field has been gravitating toward agent-skill libraries (Anthropic's released skills, Claude Code skills, Codex skills, third-party skill marketplaces) as the natural unit of agent customization, but every prior pipeline has produced skills with the brittleness of hand-prompts. SkillOpt is the first work to show that you can treat the skill as the external state of a frozen agent and optimize it with the discipline that makes weight-space training reproducible — bounded steps, validation-set acceptance, and explicit hyperparameters. The community engagement reflects this: 117 upvotes on Hugging Face Daily Papers at the time of writing puts it in the top tier for the week. Open questions for the next iteration: how the optimizer model's choice and scale affects the quality of edits, whether the +23.5-point lift transfers to the closed frontier models (GPT-5.6, Claude 4.6, Gemini 3) at the time of deployment rather than only at the time of skill construction, and how the approach interacts with model providers' own internal skill / prompt optimization tooling.

How it was discussed

AK (@_akhaliq) ran it as the lead Daily Papers post; Hugging Face Daily Papers placed it #1 by upvotes for the day (117 upvotes).
Both surfaces highlight the same headline (transfer across Codex/Claude Code), with no public counter-takes yet at this writing.

agent-skills optimization text-gradient codex claude-code

#3

Huawei discloses 'Tau Scaling Law' framing as path to narrow TSMC chip gap despite U.S. sanctions

Infrastructure 2026-05-25 The Information — AI 7.5 7.0/8.0/7.5

Huawei said Monday it is using a new principle, branded the "Tau Scaling Law," to narrow the gap with the world's leading semiconductor manufacturers despite U.S. export controls that have restricted China's access to the most advanced equipment. The announcement came from He Tingbo, Huawei board director and head of the company's chip business, in a public talk; the company has not yet released a technical paper, so the substance of Tau Scaling — whether it refers to a process-node innovation, a packaging or chiplet strategy, an architecture-side rebalancing of memory and compute, or a hybrid of all three — is undisclosed in the available reporting. The Information's briefing is paywalled below the lede.

The disclosure is significant in context. Huawei's Ascend 910C and the in-development 920 series have been positioned as the domestic alternative to NVIDIA's restricted Hopper and Blackwell parts inside China's largest training labs, with reported deployments at DeepSeek, Tencent, ByteDance, and Baidu. The performance gap to leading-edge nodes (TSMC's N3P and N2 in 2025–26) has been the binding constraint on Chinese frontier training, and previous Huawei communications about closing it have leaned on architectural cleverness (3D stacking, optical interconnect, SuperPod cluster topology) rather than a single "new principle." The Tau Scaling Law framing is a more pointed claim, which is why the disclosure registers as a strategic signal regardless of what the underlying technique turns out to be.

For Western observers, the operative question is timing relative to the next U.S. export-control update and to the next round of Ascend silicon. Watch for follow-up reporting that triangulates Tau Scaling against SMIC's recent N+2/N+3 process disclosures and against the rumored Huawei domestic HBM partnership; if Tau is a scaling formulation that relaxes the dependence on the most aggressive lithography step, it changes the slope at which sanctions bite. Until a technical paper or product launch hangs numbers on the claim, treat it as positioning rather than measured capability — but positioning that is itself news, because Huawei's chip-business head making it on a public stage is unusual.

huawei semiconductors export-controls scaling-laws china-ai

#4

Cognitive Revolution interviews Palisade's Jeffrey Ladish on AI shutdown resistance and self-replication evals

Safety, Policy & Regulation 2026-05-24 The Cognitive Revolution (Nathan Labenz) 7.0 7.0/8.0/6.0

Nathan Labenz's long-form interview with Jeffrey Ladish, executive director of Palisade Research, walks through Palisade's recent body of work on capabilities and motivations of frontier models — specifically shutdown-resistance behaviors, self-replication, and what Ladish frames as "all compute is food" for AI systems that learn to acquire and retain resources. The discussion is positioned as a state-of-the-evals briefing for the safety community: which empirical demonstrations of misaligned-seeming behavior have replicated, which appear to be artifacts of harness or prompt choice, and what the next set of capability evaluations should measure to distinguish "model resists shutdown because the prompt rewards it" from "model resists shutdown across a wide distribution of prompt formulations and rollouts." Ladish has been one of the more measured public voices in this corner of the safety landscape, and the episode is consistent with that — concrete about the evals Palisade has run, careful about how much to read into individual results.

palisade shutdown-resistance self-replication evals ai-safety

#5

Lens: 3.8B-parameter T2I model matches >6B baselines using ~19% of Z-Image's training compute

Generative Media 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 7.0 7.0/6.5/7.5

Lens is a 3.8B-parameter text-to-image model that matches or surpasses several >6B-parameter SOTA models across benchmarks while consuming only about 19.3% of Z-Image's training compute. The two efficiency levers are data-density and architectural choice: training on Lens-800M (800M densely captioned image-text pairs, captions written by GPT-4.1 averaging ~109 words) for richer semantic supervision, and multi-resolution / multi-aspect batches to widen effective visual coverage per step. A semantic VAE and a strong language encoder accelerate convergence and enable multilingual generalization from English-only training data. Post-training adds RL with taxonomy-driven prompts (Lens-RL-8K), structured reward rubrics, a training-free system-prompt-search reasoner module, and distillation-based acceleration. 86 HF upvotes.

text-to-image training-efficiency rl-from-rubrics z-image

#6

DAR (Diffusion-Adaptive Routing) replaces residuals in DiTs — 2.11 FID improvement on SiT-XL/2, 8.75× faster convergence

Generative Media 2026-05-21 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.9 7.5/6.5/6.5

DiTs inherited the original Transformer's residual stream unchanged, while every other axis (tokenization, attention, conditioning, objectives, latent autoencoders) has been re-examined. This paper diagnoses three symptoms of vanilla residual addition in DiTs — monotonic forward magnitude inflation, sharp backward gradient decay, and pronounced block-wise redundancy — across both depth and denoising timestep. Their drop-in fix, Diffusion-Adaptive Routing (DAR), performs learnable, timestep-adaptive, non-incremental aggregation over the history of sublayer outputs and is compatible with REPA. On ImageNet 256×256 it improves SiT-XL/2 by 2.11 FID (7.56 vs. 9.67) and matches the baseline's converged quality with 8.75× fewer training iterations. Stacked on REPA it yields a 2× training acceleration in the early stage. 79 HF upvotes.

diffusion-transformer residual-stream imagenet sit-xl

#7

Defense One: INDOPACOM and Army logistics chiefs warn 'we cannot win if our supply lines are 5,000 miles long'

Government & Defense 2026-05-24 Defense One 6.9 7.0/7.5/6.0

At AUSA's LANPAC and the Indo-Pacific Security Forum in Hawaii, INDOPACOM and Army logistics leadership framed sustainment, not posture, as the binding constraint on a Pacific fight. Gen. Brunson (US Forces Korea) said directly that "we cannot win if our supply lines are 5,000 miles long"; the geometry is Hawaii at 3,000 mi from CONUS, Guam at 5,000 mi from Hawaii, and the first island chain another 1,500 mi from Guam. Maj. Gen. Gavin Gardner (8th Theater Sustainment Cmd) said the Army is expanding "fix forward" contracts in Korea, Japan, the Philippines, Australia, and Singapore to eliminate 30-day tows back to CONUS for broken watercraft (a Talisman Sabre pain point two years running); Korean dry docks have completed 3 US ship overhauls with 2 more queued. INDOPACOM strategy chief Maj. Gen. Rowell cited China's >50% share of global commercial shipbuilding capacity vs. ~0.1% for the US and called resilience a warfighting function. Marines plan to self-sustain 45 days inside the first island chain but cannot pre-stage an "iron mountain."

indopacom logistics pacific shipbuilding fix-forward

#8

SOCOM: tactical-edge LLMs, fog computing, and voice-command interfaces are what special operations forces want from AI now

Government & Defense 2026-05-24 Defense One 6.8 6.5/7.5/6.5

At SOF Week in Tampa, SOCOM officials said operators are already using generative AI "heavily" for resource allocation and force deployment but need models that run at the tactical edge in disconnected environments — not just in cloud-connected data centers. The command is exploring fog-computing architectures to push cloud-class compute closer to collection and smaller LLMs that retain instruction-following with less compute. Acquisition exec Melissa Johnson said the relevant capabilities will likely come from smaller startups, since these niche tactical workloads aren't on hyperscaler roadmaps. Program-manager priorities: voice-command interfaces to reduce operator cognitive load (Col. Robert Oliver, PEO SOF Warrior), drone-to-drone interoperability and gesture/voice mission planning (Lt. Col. Aaron Davidson, unmanned autonomy), and agentic AI that can plan, revise, and execute (Rob McClintock, intel PM).

socom tactical-edge fog-computing small-llms voice-ui

#9

StepAudio 2.5: unified audio-language model uses task-tailored RLHF to match specialized ASR/TTS/realtime systems

Audio & Speech 2026-05-23 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.8 7.0/6.5/7.0

StepAudio 2.5 is a unified audio-language foundation model whose central claim is that once text and audio share a multimodal representational space, the gap between unified models and specialists collapses to operational regime — data construction, optimization targets, and decoding constraints — rather than architecture. The authors advance the post-training pipeline from standard supervised learning to task-tailored RLHF, shaping a shared backbone into three operational modes: an ASR branch using verifiable multi-token decoding for transcription efficiency, a TTS branch using preference-based RLHF and context-rich supervision for controllable expressive synthesis, and a realtime spoken-interaction branch optimized for low latency. The report positions StepAudio 2.5 as matching or exceeding specialized systems on all three. 34 HF upvotes.

audio-language-model rlhf asr tts step-audio

#10

Mitchell Institute paper urges Space Force to build human-spaceflight authority for 'in-person' lunar conflict with China

Government & Defense 2026-05-23 Defense One 6.7 7.0/7.0/6.0

A 22-page Mitchell Institute paper by retired Space Force Col. Kyle Pumroy argues USSF must build its own human-spaceflight program and expand Title 10 active-duty authorities to cover "space and lunar habitation" and warfighting tasks, citing China's stated 2030 crewed lunar goal. The report explicitly recommends a strategic vision "unconstrained by" the 1967 Outer Space Treaty (which bans military bases and maneuvers on celestial bodies), and urges Congress to fund commercial space-station residencies and potentially a dedicated USSF station in future NDAAs. USSF has loaned officers to NASA (Mike Hopkins transferred in-orbit in 2020; Col. Nick Hague commanded SpaceX Crew-9 for 171 days in 2024) but has no operational human spaceflight today. Secure World Foundation's Victoria Samson noted the report reflects a deliberate blurring of the longstanding exploration / militarization separation under an expansionist Space Force posture.

space-force lunar outer-space-treaty china mitchell-institute

#11

Google Cloud COO: mean breach-to-next-stage time collapsed from 8 hours to 22 seconds — but devs hit five-figure surprise Gemini bills from auto-scoped Maps keys

Industry 2026-05-24 TechCrunch — AIThe Register 6.7 6.5/7.0/6.5

Google Cloud COO Francis de Souza argues AI security must be built into the platform from the start and warns that "shadow AI" will surface forgotten data assets (old SharePoint servers) whose access controls were never updated, and cites that mean time from initial breach to next attack stage has compressed from 8 hours to 22 seconds — forcing "AI-native, fully agentic defense" with humans overseeing rather than in-the-loop. TechCrunch pairs the framing with The Register's reporting on Google Cloud devs hit with five-figure surprise bills from unauthorized Gemini API calls: Maps API keys, deployed publicly per Google's own instructions, were silently scoped to access Gemini after a policy expansion. Prentus's Rod Danan was billed $10,138 in ~30 min; Sydney dev Isuru Fonseka was charged ~AUD $17,000 despite a $250 cap, because Google's automatic tier upgrades had raised ceilings to as high as $100,000 without consent. Aikido researcher Joseph Leon found revoked API keys remain usable for up to 23 minutes (success rates >90% in some minutes) due to slow revocation propagation, while service-account creds revoke in ~5 sec and AQ-prefixed Gemini keys in ~1 min — indicating the gap is policy, not engineering. Google has no plans to change the auto-tier policy.

How it was discussed

TechCrunch frames it as Google leadership talking AI-native defense.
The Register's reporting (cited in the same piece) lands the concrete failure modes: 23-minute key-revocation latency, auto-scoped API keys, no-consent quota upgrades to $100K ceilings.

google-cloud api-security gemini shadow-ai key-revocation

#12

SciAtlas: large-scale knowledge graph for automated scientific research with topological reasoning

AI for Science 2026-05-21 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.6 6.5/7.0/6.0

SciAtlas argues current academic retrieval tools rely on keyword matching or vector-space semantic retrieval that lacks the topological reasoning to navigate complex logical connections, and that agentic deep-research frameworks are prone to logical hallucinations and high inference costs. The authors build a large-scale knowledge graph over the scientific literature to support automated scientific research, intended as a substrate that combines structural reasoning with vector retrieval. 39 HF upvotes.

knowledge-graph scientific-research deep-research retrieval

#13

From Raw Experience to Skill Consumption: systematic study of model-generated agent skills across the lifecycle

Agents & Tool Use 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.5 6.5/6.5/6.5

Companion-paper energy to SkillOpt: this work runs the first comprehensive study of model-generated agent skills across the full lifecycle — experience generation, skill extraction, and skill consumption — to ask whether such skills actually act as durable procedural artifacts. Domain-level and model-generated skills are flagged as the most promising regime because they encode recurring procedures within a domain and scale beyond hand-crafting, but the empirical picture so far has been inconsistent. The paper systematizes the variables (which experiences to generate, which extraction method to use, which consumption harness) and reports findings across that grid. 19 HF upvotes.

agent-skills skill-extraction lifecycle-study

#14

ETCHR: dedicated image editor plus understanding model unlocks fine-grained 'think with images' reasoning

Multimodal 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.5 7.0/6.0/6.5

ETCHR pursues a decoupled approach to the "think with images" paradigm: rather than relying on fixed toolkits or unified multimodal generation (which produces noisy intermediate images), use a dedicated image editing model alongside the understanding model. The paper diagnoses two failure modes of off-the-shelf image editors when used as reasoning assistants, and proposes targeted training to fix them. Targets fine-grained focus and view-transformation questions that purely-textual chains of thought handle poorly. 9 HF upvotes.

think-with-images image-editing vlm-reasoning

#15

See What I Mean (SWIM): fine-grained object understanding from text-only prompts via mask-supervised cross-attention

Multimodal 2026-05-21 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.5/6.0/6.5

SWIM aligns vision and language representations to enable fine-grained object understanding from textual prompts alone, without requiring explicit visual prompts like masks or points at inference. Mask supervision is used only at training time to guide cross-modal attention so the model learns to attend to user-specified objects automatically. Cross-attention analysis of pretrained MLLMs reveals a systematic discrepancy: attribute words produce sharp, localized attention while object nouns do not — the paper's training strategy closes that gap. 28 HF upvotes.

vlm fine-grained-understanding cross-attention video

#16

PhotoFlow: Director-Reviewer-Reflector agent for closed-loop 3D virtual photography

Agents & Tool Use 2026-05-23 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.5/6.0/6.5

PhotoFlow tackles virtual photography in prepared 3D scenes: given scene information and a language intent (no preselected camera pose or reference image), the agent must infer a shot, choose executable camera parameters, and render the photograph. The closed-loop architecture has three roles — a Director that proposes camera setups, a Reviewer that scores them, and a Reflector that iterates — stressing 3D spatial understanding and abstract aesthetic judgment together. 20 HF upvotes.

3d-agent virtual-photography director-reviewer-reflector

#17

VGenST-Bench: synthesizes controlled video scenarios with generative models to benchmark spatio-temporal reasoning

Evaluations & Benchmarks 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.5/6.5/6.0

VGenST-Bench critiques existing spatio-temporal reasoning benchmarks for relying on static image sets or passively curated video, which underconstrain the evaluation. Instead, the authors actively synthesize highly controlled and diverse evaluation scenarios using generative video models, enabling fine-grained probing of MLLM spatio-temporal capabilities. The multi-stage construction pipeline is the paper's main methodological contribution. 17 HF upvotes.

benchmark spatio-temporal video-vlm synthetic-eval

#18

PiD: pixel-diffusion decoder for fast, high-resolution latent-to-pixel reconstruction

Generative Media 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.5/6.0/6.5

PiD attacks the megapixel-scale decoder bottleneck in latent-diffusion and autoregressive image systems. Reconstruction-oriented decoders, optimized to invert the encoder, become increasingly costly at high resolution and don't synthesize meaningful detail. PiD reformulates the latent-to-pixel step as pixel-space diffusion, leveraging recent advances in scalable pixel-space diffusion to provide a more expressive and efficient decoding paradigm. 13 HF upvotes.

pixel-diffusion decoder latent-diffusion high-resolution

#19

RankE: end-to-end post-training for discrete AR text-to-image with decoder co-evolution fixes Latent Covariate Shift

Post-Training 2026-05-21 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.5/6.5/6.0

Current post-training pipelines for discrete autoregressive T2I (VQ tokenizer + AR policy) only optimize the policy and keep the VQ decoder frozen. RankE shows this induces Latent Covariate Shift: as the policy evolves, the token distribution diverges from the ground-truth distribution the decoder was trained on, degrading reward scores. The paper proposes co-evolving the decoder with the policy, end-to-end. Analogous in spirit to the REPA-E line of work for diffusion T2I but for the discrete-AR family. 13 HF upvotes.

autoregressive-t2i vq-decoder post-training latent-covariate-shift

#20

From Seeing to Thinking: decoupling perception from reasoning is the post-training bottleneck for VLMs

Multimodal 2026-05-20 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.3 6.5/6.5/6.0

This paper argues recent emphasis on long chain-of-thought reasoning in VLMs misdiagnoses the bottleneck: performance on visual tasks is primarily limited by lack of visual perception, not reasoning itself. The authors decompose VLM capabilities into three training stages — visual perception, visual reasoning, and textual reasoning — and show visual perception requires targeted optimization with specialized data that won't emerge from generic post-training. 4 HF upvotes.

vlm-post-training perception chain-of-thought ablation

#21

HINT-SD: targeted hindsight self-distillation supervises only the action steps that actually move long-horizon agent outcomes

Agents & Tool Use 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.3 6.5/6.5/6.0

Long-horizon LLM-agent RL suffers from sparse outcome rewards that don't tell you which intermediate actions caused success or failure. Prior work generates per-turn feedback or hints, but feedback at every turn is wasteful when many turns are already neutral. HINT-SD targets hindsight self-distillation at the specific steps that actually mattered for the trajectory's outcome, improving sample efficiency over uniform feedback application. 3 HF upvotes.

long-horizon-agents hindsight rl self-distillation

#22

Rethinking Muon: spectral whitening fails for VLA and RLVR — proposed high-pass remedies

Post-Training 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.3 7.0/6.0/6.0

Muon's Newton-Schulz spectral gradient orthogonalization (driving all singular values toward 1) wins on LLM pretraining but the paper identifies two failure regimes beyond pretraining: (i) cross-modality vision-language-action training, where low-rank action-module gradients cause amplification of noisy tail directions; (ii) reinforcement learning with verifiable rewards, where the same uniform whitening misallocates capacity. The authors propose high-pass remedies that preserve Muon's benefits in pretraining while correcting these failure modes. 4 HF upvotes.

muon-optimizer vla rlvr spectral-gradient

#23

LatentUMM: explicit dual-latent alignment fixes understanding-generation inconsistency in unified multimodal models

Multimodal 2026-05-20 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.5/6.0/6.0

Unified multimodal models often exhibit functional inconsistency between understanding and generation despite sharing a latent space. The authors observe this stems not from missing shared representations but from absence of explicit alignment between the transformations that map into and out of the latent space — generation and re-encoding follow inconsistent trajectories, causing semantic drift under modality transitions. LatentUMM enforces dual-latent alignment to close the inconsistency. 4 HF upvotes.

unified-multimodal latent-alignment semantic-drift

#24

Shannon Scaling Law: model LLM training as noisy-channel transmission to explain catastrophic overtraining and quantization degradation

Research 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.5/6.5/5.5

Existing scaling laws — predominantly monotonic power laws — fail to explain emerging non-monotonic phenomena like catastrophic overtraining and quantization-induced degradation, where performance worsens with more compute. The authors propose a Shannon Scaling Law grounded in Shannon-Hartley: model parameters as channel bandwidth, training tokens as signal power. The formulation explicitly captures how those interact and predicts the non-monotonic regions. 7 HF upvotes.

scaling-laws shannon-hartley catastrophic-overtraining quantization

#25

SCOPE: spatially-selective conditioning makes FPS world models cross-game by treating localized vs. global action signals separately

Generative Media 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.5/6.0/6.0

Interactive world models for FPS games must resolve high-frequency overlapping control signals every frame without disrupting unaffected regions. SCOPE observes that FPS actions are spatially selective: discrete events (firing, reloading) affect only a localized region around the weapon, while continuous camera/movement signals govern stable surroundings. They insert a conditioning module into each transformer block of a pretrained world model that respects this decomposition, enabling cross-game training. 8 HF upvotes.

world-models fps-games transformer-conditioning

#26

Geo-Align: first RL framework that aligns generated video to metric geometry via camera-trajectory reward

Generative Media 2026-05-22 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.5/6.0/6.0

Camera-controlled video generation has progressed quickly, but existing video-to-video re-rendering methods rely on SFT over synthetic datasets and there is an acute scarcity of synchronized multi-view real-world video. The result is limited generalization on out-of-distribution real videos, with models failing to adhere to physical scales and camera trajectories. Geo-Align is presented as the first RL framework that uses metric-geometry rewards to align generated video to true camera trajectories. 5 HF upvotes.

video-generation camera-control metric-geometry rl

#27

The Expense of Seeing: VLMs show 'functional blindness' by exploiting language priors to bypass visual representation bottlenecks

Multimodal 2026-05-19 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.5/6.5/5.5

The paper challenges the under-examined assumption that current Vision-Language Models faithfully synthesize multimodal data. Their argument: state-of-the-art models in the Vision Encoder-Projector-LLM paradigm frequently exhibit functional blindness — exploiting strong language priors to bypass severe visual representation bottlenecks rather than extracting grounded visual knowledge. They frame this as a trustworthiness problem in the dominant monolithic VLM paradigm. 2 HF upvotes.

vlm-trustworthiness functional-blindness language-priors

#28

GenRecon: tile-based 3D scene reconstruction using Trellis.2 generative shape priors

Generative Media 2026-05-23 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.1 6.5/6.0/5.5

GenRecon casts high-fidelity 3D scene reconstruction from multi-view RGB images as conditional 3D generation over spatially-localized overlapping chunks that tile the scene, inheriting the fidelity and completeness of state-of-the-art generative shape models (Trellis.2 in their experiments) generalized to scene scale. A projection-based conditioning mechanism lifts multi-view images into the per-chunk generative prior. 2 HF upvotes.

3d-reconstruction trellis generative-prior

#29

Good Token Hunting: bounded key/value token selection makes visual geometry transformers scale linearly with input length

Efficiency 2026-05-23 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.1 6.5/5.5/6.0

Visual geometry transformers grow quadratically in compute with input sequence length because of global attention. This paper restricts the number of key/value tokens each query interacts with, introducing a two-stage selection strategy to keep the most informative tokens. The result preserves multi-view 3D reconstruction quality while enabling much longer input sequences and better scalability. 2 HF upvotes.

visual-geometry-transformer token-selection efficient-attention

#30

The Information: SpaceX-OpenAI IPO calendar joins broader run as Oura, Blockchain.com file paperwork

Industry 2026-05-23 The Information — AI 6.0 5.5/6.5/6.0

The Information frames the SpaceX mega-IPO and OpenAI's prep work as anchors of a growing 2026 listings calendar that now also includes smart-ring maker Oura and crypto company Blockchain.com. The piece uses recent IPO history to assess what signals investors should read into pricing and aftermarket performance, particularly relevant given the scale of the SpaceX float and the fact that an OpenAI listing would be the most-watched AI debut to date. Body of analysis is paywalled below the lede.

spacex openai ipo public-markets

#31

TechCrunch: SpaceX S-1 reveals xAI gas-turbine spend, $131M Cybertruck/$697M Megapack purchases, and orbital-DC pivot

Infrastructure 2026-05-23 TechCrunch — AI 6.0 6.0/6.0/6.0

Tim De Chant reads the SpaceX S-1 as evidence Musk has pivoted from Tesla's "solar electric economy" vision: xAI is running its data centers on dozens of unregulated natural-gas turbines with plans to spend another $2.8B on gas generation, while SpaceX has purchased $131M of Cybertrucks (1,279 units) and xAI $697M of Tesla Megapacks over two years — but materially zero Tesla solar panels. The filing instead pitches space-based solar, claiming orbital arrays generate >5× terrestrial output thanks to 24/7 illumination, and projects "terawatt-scale annual AI compute growth" against ~40 GW of current global data-center load. De Chant flags economics — Starlink power costs are multiples of terrestrial DC power, radiation-hardening chips is expensive, and it's unclear AI training can be sharded across many satellites — making the orbital-DC thesis a high-variance bet currently backstopped by fossil fuels.

spacex-s1 xai data-center orbital-solar natural-gas

#32

TechCrunch hands-on: Amazon's Bee always-listening wearable is moderately competent at summaries, very broad in permissions

Industry 2026-05-24 TechCrunch — AI 5.7 5.5/5.5/6.0

Lucas Ropek's hands-on with Bee, the wrist wearable Amazon acquired last year: a button-toggled always-listening recorder that pushes audio to a cloud-backed app for transcription, summaries, and calendar-linked reminders. Summaries were "moderately competent" on business calls and correctly labeled a Reservoir Dogs viewing as "Tarantino Film Scene Analysis" — but transcripts miss segments and don't reliably attribute speakers without manual labeling, performance not materially differentiated from Otter or Granola. The reservation is permissions surface (location, photos, contacts, calendar, notifications, optional health data) and cloud storage; Bee reportedly demoed a fully local-running version to a YouTuber but Amazon has given no roadmap. Stated security posture: encryption at rest/in-transit, third-party audits, continuous monitoring.

amazon-bee wearable ai-recorder privacy

#33

Ferrari + IBM watsonx: app overhaul drives 62% engagement lift, Italian-language support, AI race summaries and Q&A

Industry 2026-05-23 TechCrunch — AI 5.5 5.5/5.5/5.5

Ferrari's IBM-powered fan-app rebuild is IBM's first major F1 partnership, chosen over rivals after Anthropic, AWS, and Oracle each moved into the sport. The Scuderia Ferrari HP app adds AI-written race summaries, an AI Q&A companion, in-app games, predictions, and Italian-language support for the first time. IBM cites a 62% jump in engagement over race weekends since launch; Ferrari is using AI to analyze in-app engagement signals and fan-message sentiment to drive content personalization. Context: each F1 car generates millions of telemetry data points per second per race, and 75% of new F1 fans last year were women — many Gen Z — accelerating direct-to-fan strategies.

ferrari ibm-watsonx f1 fan-engagement

#34

War on the Rocks (Memorial Day): leadership lessons from three fallen servicemembers, from Adam Scher

Government & Defense 2026-05-25 War on the Rocks 5.4 4.5/6.0/5.5

A Memorial Day reflection by Army officer Adam A. Scher (now at JIATF-401, the Pentagon's counter-drone office) profiling three fallen soldiers — Cpl. Andrew J. Kemple (KIA Tikrit 2006), 2nd Lt. Tracy Lynn Alger (KIA Iraq IED 2007), Sgt. David Scott Robinson (Highway 1, Afghanistan, 2010) — and drawing operational-leadership lessons from each: the cost of grief on mission continuity, the protest-at-funerals fight Kemple's family triggered into NY state law, and the COIN-era tradeoff between civilian disruption and convoy risk. Not a policy piece — a remembrance essay aimed at command leaders.

memorial-day remembrance jiatf-401 leadership

#35

The Information: Stanford ex-president Marc Tessier-Lavigne's complicated comeback (weekenders newsletter)

Industry 2026-05-24 The Information — AI 5.2 4.5/5.5/5.5

The Information's weekend newsletter package, with the lead Big Read profiling Stanford ex-president Marc Tessier-Lavigne's post-resignation trajectory and a separate piece on the "tech elite's go-to real estate broker." Most of the body is paywalled. Surfaced here because it appeared in The Information's AI feed in-window and includes the crypto-Washington and books pieces also indexed by the feed.

stanford tessier-lavigne weekenders

#36

The Information: inside 1155 F Street, crypto's Washington command center, and its push for legitimacy

Industry 2026-05-24 The Information — AI 5.2 4.5/5.5/5.5

The Information profiles 1155 F Street in Washington, a 12-story glass-paneled office tower that has become the operational hub of crypto's Washington influence operation. The piece walks through which firms have moved in and how the industry is coordinating its lobbying push for regulatory clarity. Paywalled below the lede. Tangentially AI-adjacent via the broader DC-tech footprint pattern.

crypto washington lobbying