Archive

38 editions and counting.

April 2026

Apr 30

Europe's Flagship Moment

The model release cycle shows no signs of cooling. Mistral finally ships its long-teased flagship as open weights, IBM drops a full-stack enterprise family under Apache 2.0, and Ant Group quietly enters the trillion-parameter club.

Apr 29

Mistral Teases, Agents Hallucinate, Claude Goes Creative

The model release calendar is quieter today after last week's DeepSeek V4 and GPT-5.5 blitz, but three signals stand out: NVIDIA dropped a genuinely impressive multimodal MoE that runs on consumer hardware, DeepSeek teased a dedicated vision model, and Mistral appears to be readying a "Medium" tier comeback. Meanwhile, the community is stress-testing Qwen 3.6 quantizations with a rigor that suggests it's becoming the default local workhorse.

Apr 28

When Borders Close Around AI

The model layer is increasingly commoditized, and this week's action is less about frontier breakthroughs and more about squeezing existing architectures into tighter hardware envelopes. Microsoft's entry into open-source 3D generation and community-driven inference optimizations for Qwen3.6-27B tell the same story: the real competition has shifted from "who trains the biggest model" to "who makes it run where it matters." Meanwhile, the first community INT4 quant for DeepSeek V4 Flash Base signals that the open-weights ecosystem is maturing fast enough to ship usable quantizations within days of a model drop.

Apr 27

The End of Anonymous Writing

The model story this week isn't about a flashy new release: it's about trust. The confirmation that SWE-bench Verified has been systematically gamed forces the community to reckon with what "state of the art" actually means when the yardstick is broken.

Apr 26

Weights on the Table

The Chinese labs keep closing the gap, and this week's data makes the trend impossible to ignore. Xiaomi's MiMo V2.5 Pro just tied Kimi K2.6 at the top of the open-weights Intelligence Index, and Xiaomi says they're releasing the weights.

Apr 25

The $65 Billion Week

The model wars are entering a strange new equilibrium. DeepSeek V4 dropped as the largest open-weight model ever, GPT-5.5 "Spud" went GA, and yet the real story might be happening at the edges: community benchmarks are revealing that KV cache quantization behavior varies wildly between model families, which matters far more for local deployment than any leaderboard score.

Apr 24

The Double Drop

April 24 is a day for the record books. Two major frontier releases dropped within hours of each other: OpenAI finally shipped GPT-5.5 "Spud," and DeepSeek answered with the fully open-weight V4 family.

Apr 23

Dense Is the New Frontier

The big release this week is a 27-billion parameter model that shouldn't be able to do what it does. Alibaba's Qwen team dropped Qwen3.6-27B, a dense model that beats its own 397B MoE sibling on coding benchmarks, and the local inference community is losing its collective mind.

Apr 22

The $60 Billion Option

A quiet cycle on the model front today, as the industry holds its breath for Google Cloud Next 2026, which kicks off in Las Vegas this morning. The real action is at the edges: community members are extracting surprisingly capable Gemma 4 variants from Android, and Open WebUI shipped a desktop app that bundles llama.cpp for true zero-config local inference.

Apr 21

The $25 Billion Bet

The model landscape keeps compressing. Kimi K2.6 drops as open-weight and immediately challenges the best proprietary offerings.

Apr 20

Robots Run Faster Than Humans Now

The local AI scene continues to eat away at the frontier moat. This week's dominant story isn't a new model launch from a major lab, but the community's relentless optimization of what already exists.

Apr 19

The Great Convergence

The frontier model race has hit an inflection point: the top three providers are now statistically indistinguishable on composite benchmarks. That's the headline, but the real story this week is happening at the edges.

Apr 18

Claude Draws First Blood

The big story this week isn't a new model, it's what happens when models start shipping as products. Opus 4.7 officially launched and immediately powered Claude Design, turning a model release into a market-moving event.

Apr 17

Opus Drops, Qwen Fires Back

The model wars intensified this week with two flagship releases landing within hours of each other. Anthropic shipped Opus 4.7, its long-anticipated upgrade, while Alibaba quietly dropped Qwen3.6-35B-A3B, a sparse MoE model that immediately captured the local AI community's imagination.

Apr 16

The Machines Look Inward

The model release cadence continues to accelerate, but the real story this week is not about who's biggest. It's about who's smallest.

Apr 15

Agents Start Running the Lab

The model landscape just got more interesting, and not because of a big new base model. OpenAI went niche with a cybersecurity-specific GPT-5.4 variant.

Apr 14

The Memo Heard Round the Valley

The model landscape is in a strange holding pattern this week. Everyone is watching for OpenAI's "Spud" to drop, possibly today, while the rest of the industry jockeys for position in an increasingly crowded field.

Apr 13

When the Cache Breaks

The local AI ecosystem is having a moment. While the big labs jostle for position on benchmarks and ARR, this week's real action is at the infrastructure layer: llama.cpp just unlocked audio processing for Gemma 4, speculative decoding is delivering 50% speedups on consumer GPUs, and a solo developer claims to have cracked 1M-token context windows on a $600 graphics card.

Apr 12

The Guardrails Paradox

The model news today is less about flagship launches and more about the expanding frontier of efficiency: tiny vision models that run on edge devices, speculative decoding hitting practical speeds on consumer hardware, and the continued march of Gemma 4 into every corner of the ecosystem. MiniMax M2.7's official release confirms what was telegraphed days ago, but the real story is a landscape where useful AI is getting smaller, faster, and cheaper at an accelerating rate.

Apr 11

Reality Bites Back

The open-source race is heating up on multiple fronts this week. GLM-5.1 continues to climb benchmark leaderboards, now dominating code arena rankings and showing surprisingly strong agentic performance.

Apr 10

The Cyber Arms Race Escalates

The model race this week isn't about who can score highest on a leaderboard. It's about who gets to wield the most dangerous capabilities, and under what terms.

Apr 9

Goodbye Llama, Hello Agents

The model landscape shifted dramatically in the past 24 hours. Meta broke from its open-source identity with a proprietary model debut, Anthropic pivoted hard toward production agent infrastructure, and the open-weight community scrambled to keep up with Gemma 4's rapid llama.cpp fixes.

Apr 8

The Model Too Dangerous to Ship

The biggest story today isn't just a new model: it's a new paradigm for how frontier capabilities get deployed. Anthropic broke with every convention in the playbook by unveiling Claude Mythos Preview and then immediately refusing to release it to the public, instead funneling it through a hand-picked consortium of tech giants for defensive cybersecurity work.

Apr 7

The $30 Billion Breakout

The model landscape this week is less about flashy new releases and more about the infrastructure arms race that will determine who can actually serve them. Anthropic's revelation that its revenue run rate has tripled to $30 billion in three months is the headline, but the story underneath is about compute: a 3.5-gigawatt TPU supply deal with Google and Broadcom signals that the constraint on frontier AI has shifted decisively from algorithms to power and silicon.

Apr 6

The Geopolitical AI

The Gemma 4 aftershocks continue to ripple through the ecosystem. Four days after launch, the 31B dense model is proving to be far more disruptive than its modest parameter count suggested, with community benchmarks showing it punching well above its weight class against frontier APIs at a fraction of the cost.

Apr 5

The Infrastructure Wall

The Gemma 4 era is in full swing, with Google's latest open model dominating community conversation as users push it across exotic hardware and demanding benchmarks. Meanwhile, the revenue race between OpenAI and Anthropic is tightening fast, with Anthropic's annualized run rate now at $19B and closing.

Apr 4

Mythos Looms, Tape Leaks

Gemma 4 dominated the last cycle's headlines, but the real story this week is what's lurking just offstage. Anthropic's leaked Mythos model is drawing CNN coverage and government briefings.

Apr 3

Gemma 4 Lands, Claude Gets Feelings

The open-weights race took a decisive turn this week. Google officially shipped Gemma 4 under Apache 2.0, and the license change may matter more than any benchmark number.

Apr 2

One-Bit Ambitions

Today's model landscape is defined by a striking paradox: models are simultaneously getting much bigger and much smaller. Qwen 3.6 Plus officially graduates from preview to full release with a 1M-token context window and agentic-first design, while PrismML's Bonsai proves you can compress an 8B model into 1.15GB and still match benchmarks.

Apr 1

The Source Code Spills

A quieter day on the model front, overshadowed by the Claude Code leak dominating every feed. The most interesting release is CoPaw-Flash-9B, Alibaba's bet that small agentic models can punch above their weight when fine-tuned on real agent trajectories.

March 2026

Mar 31

When Agents Break Free

The model landscape keeps fracturing. Alibaba dropped two major releases in 24 hours: Qwen3.5-Omni, their most capable multimodal model to date, and the surprise appearance of Qwen 3.6 Plus Preview on OpenRouter.

Mar 30

When Machines Hunt Their Own Bugs

The headline this week isn't a single model drop, it's the accelerating convergence of inference optimization, hardware democratization, and the quiet emergence of models that improve themselves in production. Cursor's Composer 2 is rewriting itself every five hours via real-time RL.

Mar 29

The Sycophancy Problem

The big story in model releases this week is not a single blockbuster launch but a constellation of incremental advances: Google's Gemini API gets a coding boost, GPT-5.4 casually aces the hardest math competition in the country, and the TurboQuant frenzy continues as implementations land on every platform. The pattern is clear: raw model scale is yielding diminishing returns, and the action has shifted to inference-time techniques, agent harnesses, and quantization tricks that extract more from what we already have.

Mar 28

The Mythos Meltdown

The model release calendar continues its relentless pace, but today's standout isn't a frontier lab release: it's a Chinese open-weight challenger punching far above its weight class, and a trillion-parameter scientific model that signals the arrival of domain-specific foundation models at unprecedented scale. Meanwhile, TurboQuant continues to dominate the local inference conversation, with real users now running frontier-class context windows on consumer hardware.

Mar 27

The Mythos Leak

The quantization wars are heating up fast. While TurboQuant barely had time to settle into llama.cpp, a Clifford algebra challenger has already arrived claiming 10-19x speedups.

Mar 26

Silicon Meets the State

The big story today isn't a new model: it's a new way to measure them. ARC-AGI-3's launch redefines what we mean by "intelligence benchmark," shifting from static puzzles to interactive exploration, and the results are humbling.

Mar 25

The Supply Chain Reckoning

The model landscape today is defined less by frontier leaps and more by infrastructure plays. Sber's GigaChat 3.1 drops open weights for a 702B MoE and a tiny 10B MoE under MIT license, challenging the assumption that open-weight heavyweights only come from US and Chinese labs.

Mar 24

Agents Take the Desktop

The big story this cycle isn't a single model launch but a convergence signal: the top of SWE-bench Verified is now a three-way tie between Anthropic, Google, and OpenAI at ~80%, and Cursor just proved you can fine-tune a Chinese open-source model to rival them at 86% lower cost. Meanwhile, FlashAttention-4 essentially closes the gap between attention and raw matmul speed on Blackwell GPUs, meaning the infrastructure layer is catching up to the model layer.