The $30 Billion Breakout

New Model Releases & Benchmarks

The model landscape this week is less about flashy new releases and more about the infrastructure arms race that will determine who can actually serve them. Anthropic's revelation that its revenue run rate has tripled to $30 billion in three months is the headline, but the story underneath is about compute: a 3.5-gigawatt TPU supply deal with Google and Broadcom signals that the constraint on frontier AI has shifted decisively from algorithms to power and silicon. Meanwhile, Meta confirmed plans to open-source its next-generation models, Google quietly shipped an on-device dictation app powered by Gemma, and MiniMax's self-evolving M2.7 continues gaining traction. The message is clear: the big labs are building for a world where models are everywhere, at every scale, and the bottleneck is who can run them fastest.

Update: Anthropic Revenue Surges Past $30B, Secures 3.5GW TPU Deal

Previously reported at a $19B run rate just days ago, Anthropic has now disclosed a $30 billion annualized revenue run rate, up from $9B at end-2025, representing an unprecedented 3x jump in roughly three months. The company also revealed that 1,000+ enterprise customers now spend over $1M annually, doubling from 500 in February. Simultaneously, Broadcom confirmed a supply deal under which Anthropic will access approximately 3.5 gigawatts of next-generation TPU-based compute starting in 2027, with the arrangement running through 2031. Anthropic announced the partnership alongside plans to consume compute across AWS Trainium, Google TPUs, and Nvidia GPUs.

Why it matters: This is the fastest revenue scaling in enterprise tech history, and the multi-cloud compute strategy signals Anthropic is building redundancy against any single hardware supplier. The Broadcom filing's cautionary note that the deal depends on "Anthropic's continued commercial success" underscores the symbiotic risk.

Meta Confirms Open-Source Plans for "Avocado" and "Mango" Models

Axios reported that Meta is developing two new proprietary frontier models: an LLM codenamed "Avocado" and a multimedia generator codenamed "Mango." Under the continued hybrid strategy led by chief AI scientist Alexandr Wang, Meta plans to eventually release open-source versions of both, though the largest configurations will remain proprietary. SiliconAngle confirmed the reports.

Why it matters: Meta's commitment to open-weight releases continues to set the floor for what the community can run locally. The "Mango" multimedia model could be the first open-weight competitor to proprietary video generation systems at frontier scale.

Google Quietly Launches AI Edge Eloquent Dictation App

Google released "AI Edge Eloquent" on the iOS App Store on April 6 with no press release. It is a free, offline-first voice dictation app powered by on-device Gemma models that transcribes speech in real time, strips filler words, and outputs polished text. An optional cloud mode uses Gemini for text cleanup, and personal vocabulary can be imported from Gmail. As 9to5Google noted, the Android version is listed but not yet live.

Why it matters: This is the clearest consumer-facing demonstration of Gemma's on-device capabilities and signals Google's strategy to use open models as a wedge into privacy-sensitive use cases where cloud APIs face resistance.

MiniMax M2.7: Self-Evolving Model Gains Momentum

MiniMax's M2.7, which launched March 18, continues to gain adoption and community attention. The model uses short-term memory, self-feedback, and self-optimization to improve across agentic task iterations. It scores 56.22% on SWE-Pro benchmark, matching GPT-5.3-Codex, and is now available on Ollama. An update posted to its HuggingFace page within the last 24 hours has the LocalLLaMA community eagerly watching.

Why it matters: Self-evolving architectures that improve during deployment, not just during training, represent a paradigm shift. If the approach scales, it could make static model releases obsolete.

Claude Code v2.1.92 Ships Ultraplan

Anthropic released Ultraplan in Claude Code v2.1.92, a planning mode triggered via /ultraplan that spins up a dedicated Opus 4.6 session with up to 30 minutes of compute in Anthropic's Cloud Container Runtime. It generates a structured implementation map covering file dependencies, execution order, and edge cases before any code is written. Plans can be reviewed in-browser and executed remotely.

Why it matters: This pushes Claude Code toward a cloud-first, plan-then-execute workflow that could fundamentally change how developers interact with coding agents, moving from reactive chat to structured engineering sessions.


Research Papers & Breakthroughs

The research front this week is dominated by a remarkable result from CMU and collaborators: a 4-billion-parameter model matching GPT-OSS-120B on mathematical theorem proving. That alone would make headlines, but the broader pattern is just as significant. Papers on hallucination mechanisms, theory-of-mind emergence in game-playing agents, and selective forgetting for reasoning models all point toward a maturing field that is moving beyond "make the model bigger" toward understanding and controlling what models actually do. The vision side sees Meta release an impressively compact perception encoder that distills frontier-model knowledge into sub-100M-parameter packages. Edge deployment keeps winning.

QED-Nano: A 4B Model Matches GPT-OSS-120B on Theorem Proving

A collaboration between CMU, ETH Zurich, Numina, and Hugging Face produced QED-Nano, a 4-billion-parameter model that achieves 40% on the challenging IMO-ProofBench, matching OpenAI's 120B-parameter GPT-OSS model. The approach combines supervised fine-tuning with long-horizon reinforcement learning using a "reasoning cache." With inference-time scaling to 1M+ tokens per problem, the model approaches Gemini-3-Pro performance. Weights are available on Hugging Face.

Why it matters: A 30x parameter efficiency gap on a frontier benchmark suggests that domain-specific training and clever inference strategies can substitute for raw scale, at least in formal reasoning domains. This has major implications for democratizing mathematical AI.

Hallucination Mechanisms Explained as Graph Path Reuse and Compression

A new arXiv paper from Xinnan Dai, Kai Yang, and collaborators models next-token prediction as graph search, identifying two distinct hallucination mechanisms. "Path Reuse" occurs when memorized knowledge overrides context early in generation, while "Path Compression" emerges from shortcut formation in late-stage training. The framework provides a structural explanation for why hallucinations persist despite scaling.

Why it matters: Moving from "models hallucinate" to "here's the specific graph-theoretic mechanism" is a prerequisite for targeted fixes. If the framework holds up, it could guide training interventions rather than post-hoc detection.

Readable Minds: Theory-of-Mind Emerges in LLM Poker Agents

Researchers Hsieh-Ting Lin and Tsung-Yu Hou demonstrated that LLM poker agents autonomously develop Theory-of-Mind up to Level 3-5 when given persistent memory. Memory was shown to be both necessary and sufficient for ToM emergence (Cliff's delta = 1.0), and the agents learned to exploit opponents in patterns mirroring human expert play.

Why it matters: This is among the strongest evidence that ToM-like reasoning can emerge from architecture and memory design alone, without explicit training for social cognition. It has implications for multi-agent systems and AI safety.

Meta EUPE: Frontier Vision in Under 100M Parameters

Meta FAIR released EUPE (Efficient Universal Perception Encoder), a compact vision encoder family under 100M parameters that rivals specialist models across image understanding and dense prediction tasks. The approach uses a novel three-stage distillation pipeline: multiple domain-expert teachers feed a 1.9B proxy teacher, which then distills into tiny students as small as 6M parameters. Code and weights are released under the FAIR Research License.

Why it matters: Sub-100M vision encoders that match specialist models open the door to sophisticated visual AI on phones and IoT devices. The three-stage distillation approach could become a standard recipe for efficient model creation.

Selective Forgetting for Large Reasoning Models

Tuan Le, Wei Qian, and Mengdi Huai proposed a framework for selectively removing sensitive reasoning components from large models while preserving general capability. The approach uses multiple LLMs combined with RAG to identify forget-relevant reasoning segments and replace them with benign placeholders.

Why it matters: As reasoning models are deployed in regulated environments, the ability to surgically remove specific knowledge (compliance data, proprietary methods) without retraining becomes essential. This is the first serious framework for reasoning-aware unlearning.

FeynmanBench: Multimodal LLMs Fail at Physics Diagrams

FeynmanBench, a new benchmark from Zeyu Wang and collaborators, tests multimodal LLMs on 2,000+ diagrammatic physics reasoning tasks spanning electromagnetic, weak, and strong interactions via Feynman diagrams. The results reveal systematic failures across all state-of-the-art models, including "unstable enforcement of physical constraints" and violations of global topological conditions.

Why it matters: Physics diagram understanding sits at the intersection of visual reasoning and domain knowledge. The systematic failures suggest current multimodal architectures have fundamental gaps in spatial-relational reasoning that simple scaling may not fix.


Industry News & Business Moves

The business story of the day is simple: Anthropic just lapped OpenAI. The $30B run rate, the TPU mega-deal, the PE venture, and the DOE Genesis Mission participation paint a picture of a company that has gone from scrappy challenger to market leader in revenue, enterprise adoption, and government access in under six months. But the broader canvas is just as consequential. Sam Altman's "Superintelligence New Deal" policy paper injects AI economics into mainstream political discourse. The DOE's Genesis Mission formalizes the U.S. government-AI industry relationship with 24 signed partnerships. And the funding machine keeps running: Shield AI's $1.5B defense round and Rhoda AI's $450M stealth debut show capital flowing into physical-world AI applications, not just chatbots.

Sam Altman's "Superintelligence New Deal" Policy Blueprint

OpenAI published a 13-page policy document calling for sweeping economic reforms to prepare for what it describes as approaching superintelligence. Key proposals include a robot/automated-labor tax, a nationally managed public wealth fund giving every American a direct stake in AI growth, a pilot 32-hour working week, and automatic safety-net triggers that activate when AI displacement metrics hit preset thresholds. Altman told Axios that superintelligence "is so close, so mind-bending, so disruptive that America needs a new social contract." Fortune reported that critics called the blueprint a cover for "regulatory nihilism."

Why it matters: Whether you view this as genuine policy leadership or strategic positioning, an AI CEO calling for robot taxes and wealth redistribution is a milestone moment. It shifts the Overton window on AI economic policy and forces other labs to take public positions.

DOE Launches Genesis Mission with 24 AI Partners

The Department of Energy launched the Genesis Mission Consortium with signed collaboration agreements from Microsoft, Google, AWS, IBM, Nvidia, Intel, AMD, OpenAI, Anthropic, Accenture, Cerebras, CoreWeave, Dell, Groq, HPE, Oracle, Palantir, xAI, and XPRIZE, among others. The mission aims to double U.S. scientific productivity within a decade using AI, with focus areas spanning energy, discovery science, and national security.

Why it matters: This is the most significant formal U.S. government-AI industry partnership to date, and it includes essentially every major American AI company. It establishes a framework for government-directed AI deployment in science that could define the sector's relationship with Washington for years.

Shield AI Raises $1.5B at $12.7B Valuation

Shield AI secured $1.5 billion in Series G funding as part of a $2.25B capital package, valuing the defense AI company at $12.7 billion, a 140% increase in one year. The raise also funds the acquisition of Aechelon Technology.

Why it matters: Defense AI is now attracting capital at a pace that rivals consumer AI. The 140% valuation jump in 12 months reflects growing government demand for autonomous systems.

Rhoda AI Emerges from Stealth with $450M for Robotic Intelligence

After 18 months in stealth, Rhoda AI publicly launched with $450 million in Series A funding, unveiling its FutureVision platform, a robotic intelligence system built on video-predictive control.

Why it matters: A $450M Series A for a robotics-focused AI startup signals that investors see embodied AI as the next major application frontier, not just a research curiosity.

OpenAI Alumni Launch $100M "Zero Shot" VC Fund

Former OpenAI staff, including head of applied engineering Evan Morikawa and original prompt engineer Andrew Mayne, launched "Zero Shot", a $100M venture fund targeting what they call "post-AGI-era startups." Early investments include Worktrace AI, Foundry Robotics, and one stealth company.

Why it matters: The brain drain from frontier labs into the VC ecosystem creates a new feedback loop: people who built the foundation models are now betting on what gets built on top of them.

UnitedHealth Group Bets $3B on AI Transformation

UnitedHealth Group is deploying $3 billion in AI investments across operations, with 22,000 software engineers on staff, over 80% of whom now use AI for code generation or agent building. The push spans claims processing, clinical decision support, and operational automation.

Why it matters: Healthcare is the largest sector of the U.S. economy. When the largest health insurer goes all-in on AI at this scale, it signals that AI deployment in regulated industries has crossed from experimentation to enterprise-wide transformation.

Neuralink announced plans to restore hearing by directly stimulating the auditory cortex via its N1 brain implant, bypassing the ear entirely. Clinical trials now include 21 participants globally with zero serious adverse device events reported. The company's speech restoration work continues to use ElevenLabs voice cloning to give patients their pre-illness voices back.

Why it matters: Expanding from motor control to sensory restoration represents a fundamental broadening of Neuralink's clinical ambitions and potential addressable market.


Reddit Community Highlights

The community mood this week is Gemma 4 everywhere. Nearly every subreddit is buzzing with benchmarks, quantization experiments, and real-world testing of the new open models. There is palpable excitement about running frontier-competitive models locally, mixed with the usual frustration over quantization quirks and context window issues. The Claude community, meanwhile, is split between genuine productivity gains and existential humor about AI dependency.

r/LocalLLaMA

PokeClaw: Gemma 4 Autonomously Controls an Android Phone

A developer pulled two all-nighters after Gemma 4's launch to build what appears to be the first working app using the model to autonomously control an Android phone, entirely on-device with no cloud dependency. Named PokeClaw (PocketClaw), the app demonstrates Gemma 4's agentic capabilities in a consumer-hardware context, handling screen reading, navigation, and action execution. The project drew immediate comparisons to OpenClaw but with the critical distinction of running fully locally.

Reddit thread: PokeClaw: First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.

Gemma 4 26B A3B: "Mindblowingly Good, If Configured Right"

A user reports achieving 80-110 tokens per second on an RTX 3090 with Gemma 4 26B A3B, with consistent speed even at high context lengths. The key finding: tool calling works reliably when properly configured, unlike other models that fall into infinite loops. The thread became a practical troubleshooting resource, with users sharing optimal LM Studio settings and discussing quantization tradeoffs for the MoE variant.

Reddit thread: Gemma 4 26b A3B is mindblowingly good, if configured right

Meta to Open-Source Next AI Models

The Axios report on Meta's plans to open-source its upcoming "Avocado" and "Mango" models generated significant discussion about Meta's hybrid strategy of maintaining proprietary versions alongside open-weight releases. Community sentiment is cautiously optimistic, with many users noting that Meta's continued commitment to open weights keeps competitive pressure on closed-source providers and ensures local LLM development remains viable.

Reddit thread: Meta to open source versions of its next AI models

r/ClaudeAI

Claude Code Ultraplan Launches

The announcement of Ultraplan in Claude Code v2.1.92 drew substantial community attention. Users are testing the cloud-first planning workflow, which drafts implementation plans using Opus 4.6 in Anthropic's cloud runtime and allows browser-based review before execution. Early feedback is positive, with users describing it as a step toward making AI coding assistants feel like collaborative engineering sessions rather than chat windows.

Reddit thread: Claude Code v2.1.92 introduces Ultraplan — draft plans in the cloud, review in your browser, execute anywhere

Anthropic Signs Multi-Gigawatt TPU Deal

The Anthropic-Google-Broadcom compute deal sparked intense discussion about the scale of infrastructure investment required for frontier AI. Users noted the $30B revenue run rate alongside the compute deal as evidence that Anthropic has overtaken OpenAI in commercial traction. Several commenters highlighted the Broadcom filing's risk disclosure as an unusually frank acknowledgment of the dependency chain.

Reddit thread: Anthropic have signed a deal for multiple gigawatts of next generation TPUs

"I'm the Bottleneck"

A highly upvoted post captured the growing sentiment that AI coding tools have become fast enough that the human operator is now the limiting factor. The thread became a frank discussion about workflow optimization, prompt engineering, and the psychological adjustment of being outpaced by your own tools. It resonated widely as a distillation of the "vibe coding" era's central tension.

Reddit thread: I'm the bottleneck

r/LocalLLM

Gemma 4 26B vs 31B on MacBook Pro 48GB

A practical head-to-head comparison on Apple Silicon found that the 26B MoE variant is dramatically more practical than the 31B dense model on consumer hardware. The 31B model took 49 minutes for a security audit that the 26B completed in 2 minutes with comparable results. The thread became a go-to reference for users deciding which Gemma 4 variant to run locally, with the consensus heavily favoring the MoE model for most use cases.

Reddit thread: MacBook Pro 48GB RAM - Gemma 4: 26b vs 31b

Local PII Masking Model Released

A developer released micro-f1-mask, a small model specifically designed to detect and mask PII before data is sent to cloud AI services. Positioned as a pre-processor for agentic workflows, the tool addresses the persistent tension between leveraging frontier cloud models and protecting sensitive data. The thread drew interest from enterprise users building compliant AI pipelines.

Reddit thread: Stop sending your raw PII to Big Tech. Just open-sourced a tiny model for local masking.

r/huggingface

OmniForge: CLI for Simplified Fine-Tuning

OmniForge, a CLI tool designed to simplify Hugging Face model fine-tuning across local environments, Kaggle, and Colab, was announced. The tool aims to reduce the boilerplate required for training workflows, offering versatile training configurations out of the box. Community reception was positive, particularly from users who find existing fine-tuning setups overly complex.

Reddit thread: OmniForge: A CLI Tool That Makes Fine-Tuning AI Models Stupidly Simple

r/accelerate

Anthropic Passes OpenAI in Revenue

The $30B revenue disclosure dominated discussion, with users drawing dramatic comparisons: the combined annual revenue of Snowflake, Datadog, Cloudflare, MongoDB, and HubSpot ($15.4B) is still half of Anthropic's current run rate. The thread crystallized community sentiment that the AI revenue race has a new leader, with Anthropic's growth from $1B in December 2024 described as unprecedented in tech history.

Reddit thread: Anthropic just passed OpenAI in revenue run rate.

Sam Altman: Superintelligence Needs a New Social Contract

Altman's Axios interview and OpenAI's policy paper generated heated debate. The proposal for robot taxes and a public wealth fund was met with a mix of genuine interest and skepticism about OpenAI's motivations. Several commenters noted the irony of the company most aggressively building toward superintelligence also being the one calling for economic safeguards.

Reddit thread: Sam Altman Told Axios That Superintelligence Is So Close & So Disruptive That America Needs A New Social Contract.

r/unsloth

Free Training and Inference in Unsloth Notebooks

Unsloth announced that users can now train and run 500+ models for free using their notebook on Google Colab's Tesla T4 GPUs. Unsloth Studio promises 2x faster training with 70% less VRAM. While the T4 GPUs limit speed, the zero-cost entry point drew significant community interest, especially from users who lack dedicated hardware.

Reddit thread: You can now train and run models for free in our notebook!

Gemma 4 LoRA Training: Surprisingly Few Trainable Parameters

A user flagged that LoRA training on Gemma 4 26B A4B yields only 0.91% trainable parameters (237M of 26B) even at rank 128 on all linear layers, far lower than expected. The thread developed into a technical discussion about MoE architecture implications for fine-tuning, with users noting that the sparse activation pattern means fewer parameters are accessible to LoRA adapters than in dense models of similar total size.

Reddit thread: Very little trainable parameters on Gemma 4 26B A4B MOE LoRA training?