Wolf Digest — 2026-06-01

#1

NVIDIA Cosmos 3: open omni world-foundation model for physical AI, with native action generation

Robotic Autonomy 2026-06-01 NVIDIA AI BlogHugging Face Blog 8.0 8.2/8.1/7.7

NVIDIA used its GTC Taipei keynote at COMPUTEX to release Cosmos 3, which it bills as the first open omni-model for physical AI reasoning and action. The pitch is that a single model now spans the full loop a robot, autonomous vehicle, or smart-space system needs: it perceives a scene, reasons about what is happening and what caused it, predicts what is likely to happen next, and then emits the action data to do something about it. Cosmos 3 takes text, video, images, ambient sound, and action as input, and generates physically grounded video, dense captions, scenario variations, and, critically, numerical action data such as joint angles, gripper positions, and trajectory points.

Architecturally, Cosmos 3 is a mixture-of-transformers split into two cooperating blocks. A reasoning block first interprets the scene — identifying which objects are moving, where paths may intersect, and what future state is likely — and a generation block then conditions on that context to produce physically plausible outputs. Calling it an omnimodel with native action generation is the substantive claim: rather than bolting a separate policy head onto a video model, the same model that imagines the next frames also writes the motor commands, and developers fine-tune it for a specific embodiment, camera layout, workspace, or task.

Two model sizes ship as open weights. Cosmos 3 Nano is an eight-billion-parameter configuration — an eight-billion reasoner paired with an eight-billion generator — tuned for efficient inference on workstation-class hardware like the RTX PRO 6000, and published on Hugging Face as nvidia/Cosmos3-Nano. Cosmos 3 Super pairs a thirty-two-billion reasoner with a thirty-two-billion generator for large-scale synthetic-data generation and research, targeting Hopper and Blackwell GPUs, as nvidia/Cosmos3-Super. There is a diffusers integration through a Cosmos3OmniPipeline class, and everything ships under the Linux Foundation's OpenMDW 1.1 license, a single model-centric license covering weights, architecture, documentation, datasets, benchmarks, and code.

On benchmarks NVIDIA claims a sweep across the categories that matter for this model class. The Cosmos 3 Nano post-trained policy is said to lead RoboLab, which tests policies in simulation across language-guided tasks, and RoboArena, which compares policies on DROID robots in the real world. As a vision-language model it is reported as the top-ranked open model on VANTAGE-Bench for smart-infrastructure scene understanding and on the TAR traffic-anomaly-reasoning challenge, and as a world generator it is said to top Physics-IQ, R-Bench, and PAI-Bench, with variants ranking first on Artificial Analysis open-weights leaderboards. Early adopters cited include the NVIDIA GEAR team building video-action models, Agile Robots generating action-conditioned trajectory data for its Thor 3 and FR3 humanoids, and Linker Vision running scene reasoning across thousands of city camera feeds.

The caveat worth flagging is that essentially all of these numbers are first-party and announced alongside the product, on a mix of NVIDIA-associated and newer benchmarks; independent replication on the robotics evals in particular is what will tell us whether the native-action story holds up outside curated settings. But the direction is clear and consequential: a genuinely open, benchmark-leading world-foundation model that emits actions, distributed under a permissive single license, lowers the barrier to building robot and AV data pipelines considerably, and pulls the physical-AI stack further toward NVIDIA's ecosystem at exactly the moment that stack is consolidating.

How it was discussed

NVIDIA's blog frames the contribution as the reasoning-then-generation split letting systems 'think before they act' in the real world.
Hugging Face's launch post emphasizes the open weights, the Nano-versus-Super split, and the diffusers Cosmos3OmniPipeline for hands-on use.
Both note native action output (joint angles, gripper positions, trajectories) as the differentiator from prior video world models.

physical AI world model VLA open weights COMPUTEX

#2

Intel to ship 'Crescent' AI data-center chip this year, betting a cheaper processor cracks NVIDIA's lock

Infrastructure 2026-06-01 The Information — AI 7.0 7.2/7.2/6.6

Intel plans to ship a new data-center AI chip, code-named Crescent, by the end of 2026, per a Financial Times interview with Kevork Kechichian, who runs Intel's data-center group. The bet is that a cheaper, simpler processor — rather than a direct assault on NVIDIA's top-end training silicon — can give Intel a foothold in a market NVIDIA overwhelmingly dominates. It is the latest attempt to find traction after the Gaudi line failed to dent NVIDIA's share, and it leans on the idea that inference-heavy and cost-sensitive buyers want a viable second source. Execution and software maturity, not the silicon alone, will decide whether Crescent matters.

Intel chips inference NVIDIA

#3

NVIDIA unveils RTX Spark, a Windows PC class for on-device agents (1 petaflop, 128GB unified memory)

Infrastructure 2026-06-01 NVIDIA AI Blog 6.9 7.1/6.8/6.8

At GTC Taipei, NVIDIA introduced RTX Spark, a new class of Windows PCs purpose-built to run personal AI agents locally, alongside updates extending on-device agents across the RTX and DGX ecosystems. RTX Spark pairs roughly one petaflop of AI compute with 128GB of unified memory — enough, NVIDIA argues, to host on-device agents that interact with applications, generate content, and manage multi-step tasks without round-tripping to the cloud. The framing leans on the surge in open-source personal-agent projects such as OpenClaw and Hermes, which are seeing rapid GitHub adoption; the pitch is privacy and latency from running them on hardware sized for the workload rather than on a remote API.

RTX Spark local agents on-device COMPUTEX

#4

SoftBank to spend up to €75B on 5GW of French AI data centers, its largest European bet

Infrastructure 2026-05-30 The Information — AITechCrunch — AI 6.8 6.7/6.8/6.9

SoftBank committed to develop and operate up to five gigawatts of AI data-center capacity in France for as much as €75 billion (about $87 billion), its largest AI-infrastructure investment in Europe to date. The first phase builds sites in Dunkirk (Loon-Plage), Bosquel, and Bouchain to deliver 3.1 gigawatts to the Hauts-de-France region by 2031. SoftBank is both an investor in and customer of OpenAI, and French officials framed the deal as validation of Macron's push to make France a hub across the AI value chain. The announcement lands as US data-center projects face mounting local opposition over grid strain and utility-price spikes.

How it was discussed

The Information frames it as SoftBank's largest European AI-infra commitment and ties it to its OpenAI relationship.
TechCrunch contrasts the warm French reception with rising US backlash over data-center grid and utility-price impacts.

SoftBank data centers France 5GW

#5

US Commerce closes a workaround: Chinese firms need licenses for advanced AI chips even via overseas units

Government & Defense 2026-06-01 The Information — AI 6.7 6.2/7.6/6.3

The US Commerce Department issued guidance on Sunday clarifying that Chinese companies still need licenses to buy advanced US AI chips even when they route the purchase through an overseas subsidiary or offshoot. The move shuts a potential workaround in which a Chinese firm could source restricted accelerators through a foreign affiliate not itself named on the entity list. It is an incremental tightening rather than new statute, but it signals continued resolve to keep frontier compute out of reach despite the proliferation of corporate structures designed to test the boundaries of the controls.

export controls BIS China compute

#6

Taiwan's supply chain ramps NVIDIA Vera Rubin: 1M+ MGX rack components across 25 sites

Infrastructure 2026-06-01 NVIDIA AI Blog 6.5 6.6/6.7/6.2

NVIDIA detailed how Taiwan's 500-plus ecosystem partners are scaling production of its Vera Rubin platform, with more than one million MGX rack components assembled across 25 factory sites as Vera Rubin ramps into full production for agentic AI factories worldwide. The supply chain spans wafer and chip partners including TSMC, SPIL, Kinsus, KYEC, and UMTC, and systems integrators including Foxconn, Pegatron, Quanta Cloud Technology, Wistron, and Inventec. NVIDIA's secondary point is that these manufacturers are now applying accelerated computing, simulation, and physical AI to their own operations, positioning Taiwan as both builder and adopter of the AI-factory model.

Vera Rubin MGX TSMC supply chain

#7

SpaceX wins $4.16B Space Force contract for space-based moving-target tracking

Government & Defense 2026-05-30 The Information — AI 6.4 6.3/6.6/6.3

The US Space Force awarded SpaceX a $4.16 billion contract under the Space-Based Airborne Moving Target Indicator (SBAMTI) program, which aims to deploy space-based sensors that track and target airborne threats from orbit. Announced Friday, the deal underscores how central commercial space and defense-tech vendors have become to the Pentagon's sensing architecture, and how much capital is flowing into orbital ISR. The AI angle is downstream but real: moving-target indication at this scale is a data-fusion and on-orbit-inference problem as much as a launch-and-sensor one.

SpaceX Space Force ISR defense

#8

NVIDIA Factory Operations Blueprint (FOX): a reference design for an autonomous 'factory manager' agent

Agents & Tool Use 2026-06-01 NVIDIA AI Blog 6.4 6.6/6.4/6.2

NVIDIA announced the Factory Operations Blueprint (FOX) at GTC Taipei — a reference design for building a centralized 'factory manager' agent that continuously monitors and reasons across live machine signals, quality systems, work instructions, and alerts, then orchestrates a fleet of specialized agents and machines to resolve issues at scale. Built on NVIDIA NemoClaw, the AI-Q Blueprint, and open Nemotron models, FOX targets quality control, material transport, and worker safety as the first specialized agents under the manager. It is the agentic-manufacturing counterpart to the Cosmos and Vera Rubin announcements: the orchestration layer that sits above plant-floor automation.

FOX industrial agents Nemotron manufacturing

#9

COLLEAGUE.SKILL: distilling person-grounded expertise into reusable agent skills

Agents & Tool Use 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.2/6.1/6.9

The most-upvoted paper on Hugging Face's daily list (41 upvotes) tackles building agents that carry bounded representations of a specific person's expertise, judgment, and interaction style. The authors argue that actionable role knowledge lives in heterogeneous traces rather than clean instructions, and propose an automated pipeline that distills those traces into structured, reusable skills an LLM agent can invoke. It is part of the broader move from prompt-engineered personas toward learned, auditable skill libraries for agents.

How it was discussed

arXiv abstract frames the gap as actionable expertise being embedded in messy traces, not written instructions.
Topped the HF Daily Papers upvote list, suggesting strong practitioner interest in agent skill distillation.

cs.AI cs.CL agents

#10

Representation Forcing: removing the frozen-VAE bottleneck in unified multimodal models

Multimodal 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.4 6.6/6.4/6.2

Unified multimodal models still lean on a frozen, separately-pretrained VAE for image generation, which imposes a structural bottleneck; naively dropping it forces the model to learn both high-level structure and low-level pixel detail at once and opens a quality gap. Representation Forcing proposes a training scheme that closes that gap and lets a single model handle perception and generation without the external VAE crutch. With 32 upvotes it was among the weekend's most-noted papers on the unification-of-understanding-and-generation thread.

cs.CV multimodal diffusion

#11

MLST: Brad Carson on AI targeting and lethal autonomy — and an 80-minute pushback

Safety, Policy & Regulation 2026-05-31 Machine Learning Street TalkMachine Learning Street Talk (MLST) 6.3 6.0/6.6/6.3

Machine Learning Street Talk hosts Brad Carson — former US Army General Counsel, two-term congressman, and Acting Under Secretary of Defense for Personnel and Readiness, now head of the AI-policy advocacy group Americans for Responsible Innovation — for a debate on AI in targeting and lethal-autonomy decisions. Co-host Keith Duggar spends roughly eighty minutes pushing back on Carson's case, making this less an interview than an adversarial examination of how much trust to place in AI systems that decide who is a threat. It is a substantive entry in the military-AI governance conversation that the digest's defense coverage tracks.

How it was discussed

MLST's framing is adversarial: Duggar contests Carson's case for restraint over ~80 minutes rather than endorsing it.
Released as both YouTube video and podcast episode under near-identical titles ('Target' vs 'Threat').

AI policy lethal autonomy defense podcast

#12

GitHub Copilot's switch to token-based billing triggers developer backlash

AI Coding 2026-05-30 TechCrunch — AI 6.3 6.0/6.0/6.9

Microsoft is moving GitHub Copilot from flat, request-based subscriptions to usage-based, per-token billing effective June 1, and developers are revolting. On Reddit and X, users reported steep jumps — one said a roughly $29/month plan would balloon toward $750 and is cancelling; another posted a screenshot showing costs climbing from about $50 to roughly $3,000. Defenders argued only 'vibe-coders' spawning hundreds of sub-agents burn that many tokens and that Copilot stays cheap 'used as a tool,' while others noted the old flat model must have been deeply subsidized. The episode is a live test of whether agentic-coding economics survive contact with real metering.

GitHub Copilot pricing agentic coding Microsoft

#13

LongTraceRL: RLVR for long-context reasoning, trained on search-agent trajectories

Agents & Tool Use 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.3 6.4/6.2/6.3

Long-context reasoning still trips up LLMs that must locate and integrate key facts buried in distracting content. LongTraceRL applies reinforcement learning with verifiable rewards over search-agent trajectories, attacking two weaknesses of prior RLVR work: distractors that are too easy to tell apart and sparse, outcome-only reward signals. By mining trajectories with rubric-style rewards it aims to teach models to track and combine evidence across long contexts. The paper drew 28 upvotes on the daily list.

cs.CL RL long-context agents

#14

SANA-Streaming: real-time streaming video-to-video editing with a hybrid diffusion transformer

Generative Media 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.3 6.5/6.0/6.4

Real-time streaming video-to-video editing demands both temporal consistency and high throughput, a hard combination for diffusion models. SANA-Streaming is a system-algorithm co-designed framework for high-resolution streaming V2V, targeting interactive uses like live broadcast and gaming. It pairs a hybrid diffusion-transformer design with throughput-oriented systems work to keep latency low without sacrificing frame-to-frame coherence — an extension of the SANA efficient-diffusion line into the streaming regime.

cs.CV diffusion video real-time

#15

Microsoft Build to showcase in-house models as it weans off OpenAI

Industry 2026-05-31 The Information — AI 6.2 6.0/6.5/6.1

Two months after its 'conscious uncoupling' from OpenAI, Microsoft wants to prove it can thrive as an AI provider that does not depend on the ChatGPT maker's models. Its annual Build conference, opening Tuesday in San Francisco for roughly 2,500 app developers, is framed as a coming-out for the team building Microsoft's own frontier models as an OpenAI alternative. The subtext is strategic de-risking: after years of reselling OpenAI, Microsoft is signaling it can supply the model layer itself.

Microsoft Build OpenAI foundation models

#16

dMoE: learnable block experts for diffusion language models

Efficiency 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.2 6.4/6.3/5.9

dMoE brings mixture-of-experts routing to diffusion LLMs (dLLMs) via learnable block experts, aiming to scale capacity without proportional compute in the non-autoregressive diffusion-decoding setting. It is a notable cross-pollination: MoE sparsity has driven autoregressive scaling, and adapting it to the block-parallel structure of diffusion language models is one of the more architecturally interesting weekend papers.

cs.LG MoE diffusion LLM efficiency

#17

Not All Disagreement Is Learnable: rethinking token selection in on-policy distillation

Post-Training 2026-05-26 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.1 6.2/6.4/5.7

On-policy distillation trains a student on its own rollouts under token-level teacher supervision, and recent selective methods prioritize high-entropy or high-disagreement tokens. This paper asks which teacher signals are actually learnable, using a fixed-context diagnostic to show that not all disagreement carries usable gradient — some of it is noise the student cannot absorb. The result refines the heuristics behind selective distillation and post-training data weighting.

cs.LG distillation post-training

#18

SwanVoice: expressive long-form zero-shot TTS for monologue and dialogue

Audio & Speech 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.1 6.2/5.8/6.3

Zero-shot TTS is strong for single-speaker monologue but breaks down on expressive long-form multi-speaker dialogue, where the common stitch-per-turn workaround adds cost and fractures acoustic and conversational consistency. SwanVoice targets unified long-form synthesis across both monologue and dialogue, preserving affect and cross-turn coherence in one pass — relevant to anyone generating podcast-style or conversational audio at length.

cs.SD eess.AS TTS speech

#19

Function2Scene: generating 3D indoor layouts from how a space will be used

Generative Media 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.0 6.0/5.7/6.3

Most text-to-3D indoor-scene methods generate from object-centric prompts — what furniture to place — rather than how the space will be used. Function2Scene generates layouts from functional specifications, judging a room by how well it supports its occupants' activities and physical needs. The reframing (function before objects) drew 23 upvotes and is relevant to design, simulation, and embodied-AI environment generation.

cs.CV 3D scene generation

#20

GGT-100K: generative ground truth for generalizable image restoration

Generative Media 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.0 6.0/5.8/6.2

Real-world image restoration is bottlenecked by scarce high-quality paired data: synthetic sets mismodel real degradations, and real paired sets are expensive. GGT-100K builds a large generative-ground-truth dataset to improve generalization of restoration models to real-world conditions, tackling the data problem rather than the architecture.

cs.CV image restoration dataset

#21

Task-Focused Memorization: deciding what a multimodal agent should remember

Agents & Tool Use 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 6.0 6.0/6.1/5.9

Effective long-term memory for multimodal agents is less about the storage module and more about deciding what to memorize. This paper argues the key challenge is selectivity — retaining task-relevant experience rather than everything — and proposes task-focused memorization for embodied and multimodal agents pursuing continual learning.

cs.AI agents memory multimodal

#22

NVIDIA expands its AI Cloud partner ecosystem for the global AI-factory buildout

Infrastructure 2026-06-01 NVIDIA AI Blog 6.0 6.0/6.1/5.9

NVIDIA detailed an expanding ecosystem of purpose-built 'AI Cloud' partners co-designed with its full-stack infrastructure to serve surging token demand from enterprises, startups, and nations seeking regional and sovereign capacity. The partners bundle NVIDIA compute, networking, and software for training, fine-tuning, inference, agentic and physical AI, with NVIDIA pitching lowest token cost as the selling point — a capacity-and-economics story underpinning the rest of the Computex announcements.

AI cloud infrastructure sovereign AI COMPUTEX

#23

Why 'forward-deployed engineers' are suddenly everywhere in AI

Industry 2026-05-31 The Information — AI 5.9 5.7/6.1/5.9

The Palantir-coined 'forward-deployed engineer' role — engineers who embed with customers to turn a capable model into deployed value — has spread across the industry as labs discover that frontier capability does not deploy itself. The Information notes Meta recently formed a new FDE-centric organization aimed at getting more advertisers to actually use its AI. The trend is a tell about where the bottleneck now sits: not raw model quality, but the last mile of integration and adoption.

talent deployment enterprise AI

#24

Streaming synchronized spatial-audio generation via autoregressive diffusion

Audio & Speech 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.9 6.0/5.6/6.1

Spatial-audio synthesis faces a quality-versus-latency tradeoff and struggles to extract precise spatial cues from multimodal inputs. This work proposes a streaming, synchronized approach built on autoregressive diffusion to generate immersive spatial audio in real time, aimed at AR/VR and immersive-media pipelines where audio must track visual scene geometry as it unfolds.

cs.SD spatial audio diffusion

#25

Can LLMs run an end-to-end data-engineering pipeline for their own specialization?

Agents & Tool Use 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.9 5.9/5.9/5.9

Adapting LLMs to specialized domains usually needs high-quality domain data, and existing curation pipelines lean on human-designed workflows. This paper probes whether an LLM can autonomously execute the full data-engineering loop — sourcing, filtering, and shaping training data — to specialize itself, testing how far agentic self-improvement can go without humans in the curation path.

cs.CL agents data curation

#26

LongDS-Bench: agents still fail at long-horizon data analysis

Evaluations & Benchmarks 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.9 5.8/6.0/5.9

LongDS-Bench evaluates agents on long-horizon data-analysis tasks and documents where they break down — multi-step workflows that require sustaining state, revisiting intermediate results, and recovering from errors over long trajectories. It joins the growing shelf of benchmarks showing that headline agent demos do not yet translate into reliable long-horizon execution.

cs.AI benchmark agents data analysis

#27

Recovering policy-induced errors: benchmarking and trajectory synthesis for robot recovery

Robotic Autonomy 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.9 6.0/5.8/5.9

Robot policies make mistakes; this work benchmarks how well they recover and synthesizes recovery trajectories to train that skill explicitly. Rather than only optimizing success on clean rollouts, it treats error recovery as a first-class capability — important for moving manipulation policies from demos toward the robustness real deployments require.

cs.RO manipulation robot learning

#28

A benchmark for long-form speech generation across diverse scenarios

Evaluations & Benchmarks 2026-05-27 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.9 5.7/5.9/6.1

Speech generation is high-fidelity but under-evaluated for long-context conditions. This paper introduces a comprehensive benchmark for long-form speech across diverse domains, filling a gap where existing tests are confined to narrow scenarios — useful for grading the long-form TTS systems (like SwanVoice) now appearing.

cs.SD speech benchmark TTS

#29

Meta reportedly building an AI pendant, expanding its wearables bet

Industry 2026-05-30 TechCrunch — AI 5.8 5.9/5.5/6.0

Per a memo viewed by The Information, Meta is developing an AI-powered pendant it aims to begin testing within a year, building on Limitless — the conversation-recording AI-device startup Meta acquired in late 2025. The memo reportedly adds plans to expand the AI-glasses lineup and launch a 'Wearables for Work' business subscription. The push aims to reverse Reality Labs, which lost $4 billion in Q1 2026, though TechCrunch notes prior always-listening AI wearables like Humane's AI Pin flopped on privacy and usefulness concerns.

Meta wearables hardware Reality Labs

#30

The Information: defense tech becomes an investment pillar as national-security AI scales

Government & Defense 2026-05-30 The Information — AI 5.8 5.7/6.0/5.7

The Information's weekend feature reports from a JPMorgan-hosted national-security confab on how defense tech has become a core investment thesis for the banking and venture worlds, with the headline question of whether OpenAI's enterprise push can catch Anthropic in selling to government and defense buyers. It is a useful market-structure read on the capital and competitive dynamics behind the SpaceX, Anduril, and Pentagon-AI headlines the digest tracks separately.

defense tech venture national security

#31

OpenAI's revenue chief retools a clunky enterprise sales motion

Industry 2026-05-30 The Information — AI 5.8 5.6/5.9/5.9

Since former Slack CEO Denise Dresser joined OpenAI as chief revenue officer in December, she has worked to clean up an enterprise go-to-market that was sometimes clunky — for instance an earlier Databricks reseller arrangement that left customers confused about who was selling what. The piece is a window into OpenAI's institutional build-out as it tries to convert model leadership into durable enterprise revenue against an aggressive Anthropic.

OpenAI enterprise sales go-to-market

#32

From prompt injection to persistent control: defending agentic harnesses

Safety, Policy & Regulation 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.8 5.8/6.0/5.6

This paper studies how prompt injection can escalate into persistent control of an agent's harness — not just a one-shot hijack but durable compromise of the loop — and proposes defenses. As agents gain tools, memory, and autonomy, the attack surface shifts from the model to the orchestration layer, and this is a timely contribution to agentic security.

cs.CR agents prompt injection security

#33

VLM3: vision-language models as native 3D learners

Multimodal 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.7 5.9/5.6/5.6

VLM3 argues vision-language models can learn 3D structure natively rather than via bolt-on geometry modules, pushing VLMs toward spatial understanding directly from their existing multimodal training rather than specialized 3D heads.

cs.CV 3D VLM

#34

Hands-on with Google's always-on Gemini Spark assistant: useful, but why a separate product?

Agents & Tool Use 2026-05-30 TechCrunch — AI 5.7 5.8/5.4/5.9

Gemini Spark, unveiled at Google I/O, is a 24/7 agentic assistant running on cloud VMs (Sundar Pichai: 'yes, you can close your laptop') that integrates with Gmail, Calendar, Docs, Sheets, and Slides. TechCrunch's hands-on found it genuinely useful — surfacing stackable retail coupons and producing a solid day-trip packing list — but hampered by gaps like an inability to write to Google Keep and searches that omitted costs and dates. The reviewer's verdict: capable, but hard to justify as a standalone brand versus a Gemini feature.

Google Gemini agents assistant

#35

Anthropic trims its list of platforms barred from trading its shares

Industry 2026-05-31 The Information — AI 5.7 5.5/5.9/5.7

Anthropic revised website guidance on unauthorized secondary-market trading of its shares, cutting the list of explicitly barred platforms to four — half the prior number — after the earlier guidance caused confusion. A minor but telling data point on the frenzied secondary market for stakes in frontier-lab equity following Anthropic's $65B Series H.

Anthropic secondary market equity

#36

TechCrunch Equity debates whether tech CEOs are 'uniquely prone to AI psychosis'

Safety, Policy & Regulation 2026-05-31 TechCrunch — AI 5.6 5.3/5.8/5.7

A TechCrunch Equity episode unpacks Box CEO Aaron Levie's claim that tech CEOs are 'uniquely prone to AI psychosis' because they sit far from the last-mile work needed to extract AI value, insisting leaders must actually use the tools. The hosts tie it to a broader anti-AI mood — students booing AI mentions, layoff anxiety, and a reported 30% bump in DuckDuckGo installs after Google pushed more AI into search — and ask whether the backlash is itself a startup opportunity. A culture-and-adoption read rather than a technical one.

AI discourse adoption podcast

#37

Hide-and-Seek in trajectories: discovering failure signals for VLA runtime monitoring

Robotic Autonomy 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.6 5.7/5.6/5.5

This work mines robot trajectories for early failure signals so a vision-language-action policy can be monitored at runtime and flagged before a task goes irrecoverably wrong — a safety-and-reliability layer for deployed VLA systems.

cs.RO VLA runtime monitoring

#38

OpenSkillEval: automatically auditing the open skill ecosystem for LLM agents

Evaluations & Benchmarks 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.6 5.5/5.7/5.6

As shareable agent 'skills' proliferate, OpenSkillEval proposes an automated audit of the open skill ecosystem — checking skills for correctness, safety, and quality — addressing the supply-chain risk of agents importing third-party capabilities.

cs.AI agents skills evaluation

#39

SCOPE: self-play via co-evolving policies for open-ended tasks

Reinforcement Learning 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.6 5.8/5.6/5.4

SCOPE applies self-play with co-evolving policies to open-ended task generation, letting agents bootstrap an expanding curriculum against each other rather than a fixed task set — part of the open-ended-learning and automatic-curriculum line.

cs.LG RL self-play

#40

Linear-scaling video VLMs for long-video understanding

Multimodal 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.6 5.7/5.5/5.6

This paper targets the quadratic cost that makes long-video understanding expensive for VLMs, proposing a linear-scaling design so models can ingest much longer videos without blowing up compute — an efficiency contribution to the long-context-video problem.

cs.CV video efficiency VLM

#41

The flip side of RLHF: on-policy feedback for reward-model self-improvement

Post-Training 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.6 5.7/5.8/5.3

Most RLHF work improves the policy from the reward model; this paper flips the direction, using on-policy feedback to self-supervise and improve the reward model itself — addressing reward-model staleness and drift as policies move away from the preference-data distribution.

cs.LG RLHF reward modeling

#42

Seeing isn't knowing: do VLMs know when not to answer spatial questions?

Multimodal 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.5 5.5/5.6/5.4

This study probes whether VLMs can abstain on spatial questions they cannot actually answer from the image, rather than confabulating — a calibration-and-honesty test for spatial reasoning that matters for embodied and safety-critical use.

cs.CV VLM calibration spatial reasoning

#43

MAAT: multi-phase adapter-aware targeted unlearning

Safety, Policy & Regulation 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.5 5.5/5.7/5.3

MAAT proposes a multi-phase, adapter-aware approach to targeted unlearning, removing specific knowledge or behaviors from a model while limiting collateral damage to unrelated capabilities — relevant to privacy, copyright, and safety-driven removal requests.

cs.LG unlearning safety

#44

Erin Brockovich launches a crowdsourced map pushing data-center transparency

Infrastructure 2026-05-31 TechCrunch — AI 5.5 5.3/5.7/5.5

Environmental activist Erin Brockovich launched a crowdsourced US map of data centers to press for transparency on construction and community impact, saying an April call for reports drew nearly 4,000 submissions in a month. The single most common concern across noise, water use, and rising utility bills, she wrote, is transparency itself — projects announced only after permits are secured, unresponsive developers, and officials signing NDAs before neighbors are informed. She stresses it is not a blanket anti-data-center or anti-AI stance.

data centers environment transparency energy

#45

The 2026 browser wars: AI-first challengers pile up against Chrome and Safari

Agents & Tool Use 2026-05-30 TechCrunch — AI 5.5 5.4/5.2/5.9

TechCrunch rounds up the agentic and privacy browsers challenging Chrome and Safari: AI-first entrants include Perplexity's Comet (gated behind a $200/month plan), The Browser Company's Dia, Opera Neon, OpenAI's Atlas (macOS, with 'agent mode'), and YC-backed Aside; privacy options include Brave, DuckDuckGo, and the from-scratch non-Chromium Ladybird (2026 alpha). The throughline is that the browser is becoming the battleground for agents that read your logged-in sites and act on your behalf.

browsers agents Comet Atlas

#46

Lumos-Nexus: efficient frequency bridging in a homogeneous latent space for video

Generative Media 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.4 5.5/5.3/5.4

Lumos-Nexus introduces frequency bridging within a homogeneous latent space to improve video generation efficiency and quality, part of the steady stream of latent-space-design work aimed at cheaper, sharper video diffusion.

cs.CV video generation diffusion

#47

Light Interaction: training-free inference acceleration for interactive video generation

Efficiency 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.4 5.6/5.2/5.4

Light Interaction speeds up interactive video generation without retraining, a training-free inference-acceleration method that lowers the latency barrier for real-time and interactive video diffusion.

cs.CV efficiency inference video

#48

Count Anything: generalizable visual counting

Multimodal 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.4 5.5/5.2/5.5

Count Anything pursues open-ended visual counting — tallying arbitrary object categories in an image without per-class training — extending the 'segment/detect anything' generalization trend into counting.

cs.CV counting open-vocabulary

#49

The good, the bad, and the ugly of Markov-boundary methods for tabular prediction

Research 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.4 5.4/5.5/5.3

A critical analysis of Markov-boundary feature-selection methods for tabular prediction, cataloguing where they help, where they hurt, and where they fail — a measured contribution to the perennial tabular-ML methodology debate.

cs.LG tabular feature selection

#50

FRAPPE: full-input, residual-output autoencoding with projection-pursuit encoding

Research 2026-05-27 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.3 5.4/5.4/5.1

FRAPPE proposes an autoencoding scheme combining full-input encoding with residual outputs and projection-pursuit-style encoding, a representation-learning method aimed at better-structured latent codes.

cs.LG representation learning autoencoder

#51

Frequency-guided action diffusion via sub-frequency manifold traversal

Robotic Autonomy 2026-05-27 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.3 5.4/5.3/5.2

This work structures action-diffusion policies by traversing sub-frequency manifolds, a frequency-domain take on generating smooth, dynamically-consistent robot action sequences from diffusion models.

cs.RO diffusion policy action generation

#52

DRIFT: decoupled rollouts and importance-weighted fine-tuning for efficient multi-turn RL

Reinforcement Learning 2026-05-29 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.3 5.4/5.4/5.1

DRIFT decouples rollout collection from policy updates and uses importance-weighted fine-tuning to make multi-turn RL more sample- and compute-efficient — an optimization-side contribution to training agents over long interaction horizons.

cs.LG RL multi-turn efficiency

#53

AnyMo: scaling any-modality conditional motion generation with masked modeling

Generative Media 2026-05-28 AK (@_akhaliq) Daily PapersHugging Face Daily Papers 5.3 5.4/5.2/5.3

AnyMo unifies conditional human-motion generation across modalities (text, audio, and more) using masked modeling, aiming for a single backbone that scales across the fragmented landscape of motion-generation conditioning signals.

cs.CV motion generation masked modeling