Robots Run Faster Than Humans Now

New Model Releases & Benchmarks

The local AI scene continues to eat away at the frontier moat. This week's dominant story isn't a new model launch from a major lab, but the community's relentless optimization of what already exists. Qwen 3.6-35B-A3B is being distilled, benchmarked, and squeezed onto every Mac in sight. Meanwhile, llama.cpp just got meaningfully faster for certain workloads, and a mysterious new entrant called WOZCODE appeared on Terminal-Bench 2.0. The frontier providers are quiet for once; the action is at the edges.

Opus 4.7 Reasoning Distilled into Open 35B MoE

A developer known as "hesamation" fine-tuned Qwen3.6-35B-A3B using chain-of-thought traces from Claude Opus 4.7, producing a model that imitates Opus's structured reasoning style while running on a single A100 or high-end consumer hardware. This follows a broader wave of Opus reasoning distillations by Jackrong, who released a v2 trained on 14,000 Claude 4.6 Opus-style reasoning samples to reduce verbose over-analysis on easy problems. The models aim to teach smaller architectures to "think" more economically by adopting structured reasoning patterns rather than brute-force chain-of-thought.

Why it matters: This is exactly the kind of adversarial distillation that US frontier labs are trying to stop from Chinese competitors, yet it's happening openly on HuggingFace. The line between legitimate fine-tuning and unauthorized capability extraction grows blurrier by the day.

llama.cpp Merges Speculative Checkpointing

Pull Request #19493 landed in llama.cpp on April 18, adding speculative checkpointing to the server. The feature creates intermediate checkpoints during prompt processing, allowing speculative decoding to work with recurrent and hybrid architectures. For repetitive tasks like code generation, users report 0-50% speedups using ngram-based speculation. The tradeoff: partially accepted drafts require rolling back to checkpoints, so gains depend heavily on task type and draft acceptance rates.

Why it matters: Speculative decoding has been llama.cpp's most impactful performance feature, and this extension to hybrid architectures (like Qwen's MoE models) means the local inference stack keeps closing the gap with cloud API latency.

WOZCODE Appears on Terminal-Bench 2.0

A previously unknown entry called "WOZCODE" surfaced on the Terminal-Bench 2.0 leaderboard on HuggingFace, drawing attention from the community. Terminal-Bench 2.0 is one of the more respected agentic coding benchmarks, testing AI agents across 89 real-world terminal tasks including debugging async code, assembling proteins, and resolving security vulnerabilities. Details about WOZCODE's architecture and provenance remain scarce at time of writing.

Why it matters: New anonymous benchmark entries often precede major model announcements. The community is watching to see if this is a stealth drop from an established lab or a dark horse from a smaller team.

Update: Qwen 3.6-35B-A3B as Daily Coding Driver

The conversation around Qwen 3.6-35B-A3B has shifted from "impressive benchmark" to "can I actually use this?" Simon Willison demonstrated it drawing better SVG pelicans than Opus 4.7 on a MacBook Pro M5. Community members on r/LocalLLaMA are now reporting mixed results running it as a coding agent with 32K context windows on 32GB Macs, finding memory constraints a real bottleneck for sustained agentic sessions. Its 73.4% on SWE-bench Verified is remarkable for an Apache 2.0 model activating only 3B parameters.

Why it matters: The gap between "beats benchmarks" and "replaces my API subscription" remains significant. Real-world coding agents need sustained context, tool use, and error recovery, not just single-shot task completion.


Research Papers & Breakthroughs

The research highlights this cycle are refreshingly diverse: one team is building digital petri dishes where neural agents fight for survival, another argues LLMs think in shapes rather than words, and a Cell paper introduces what amounts to a remote control for genetics. The common thread is researchers pushing beyond narrow optimization toward understanding emergent behavior, both in silicon and in biology.

Sakana AI: Petri Dish Neural Cellular Automata

Sakana AI published Petri Dish Neural Cellular Automata (PD-NCA), a differentiable Artificial Life substrate where up to 15 independent neural agents coexist, compete for territory, and adapt through continuous gradient-based optimization on a shared 2D grid. Unlike conventional Neural Cellular Automata where behavior unfolds deterministically from pre-trained rules, PD-NCA's "learning-in-the-loop" design enables open-ended adaptation within a single differentiable simulation. Agents develop attack and defense channels, specialize into ecological niches, and exhibit emergent complexity that conventional training paradigms cannot produce. The code is open-sourced on GitHub.

Why it matters: This is a serious step toward open-ended AI systems that evolve rather than converge. If you believe the path to more capable AI runs through ecological dynamics rather than scale alone, PD-NCA is one of the most concrete implementations of that thesis.

LLM Neuroanatomy III: LLMs Think in Geometry, Not Language

Researcher David Noel Ng published the third installment of his LLM Neuroanatomy series, presenting evidence that LLMs organize internal representations geometrically rather than linguistically. The key finding: in middle transformer layers, sentences about the same topic cluster more tightly than sentences in the same language about different topics, suggesting a universal "thinking space" where semantic relationships are encoded as distances and directions. This aligns with recent ICLR 2026 work showing that different computational stages within a single transformer block map to anatomically distinct brain systems.

Why it matters: If LLMs genuinely compute in a language-agnostic geometric space, it challenges the Sapir-Whorf hypothesis for machines and has practical implications for multilingual deployment, interpretability research, and understanding what "reasoning" actually means inside a transformer.

Mechanogenetics: Ultrasound as a Remote Control for Cells

A paper published in Cell00330-2) describes a major advancement in synthetic biology: using focused ultrasound rather than drugs or light to remotely control gene expression deep inside the body. The technique mechanically induces localized protein expression within targeted cell populations, creating what researchers describe as cellular "training centres" that can activate therapeutic responses. Early applications focus on engineering CAR-T cells for cancer immunotherapy, where ultrasound can prime tumors for treatment without systemic side effects.

Why it matters: Light-based optogenetics revolutionized neuroscience but couldn't penetrate deep tissue. Ultrasound-based mechanogenetics solves the depth problem, potentially enabling non-invasive control of engineered cells anywhere in the body, a genuine platform technology for next-generation therapeutics.


Industry News & Business Moves

The biggest story isn't about AI at all, or maybe it's the biggest AI story of the year. A humanoid robot just smashed the human half-marathon record by seven minutes, one year after robots could barely finish the course. Meanwhile, the geopolitical battle over open-source AI heats up with a WSJ editorial arguing America needs to embrace it, and the data center backlash has moved from NIMBYism to actual violence. The industry is moving so fast that even the backlash is accelerating.

Humanoid Robot Smashes Human Half-Marathon Record in Beijing

Honor's "Lightning" H1 humanoid robot completed the Beijing Half-Marathon on April 19 in 50 minutes and 26 seconds, beating all 12,000 human competitors and surpassing Jacob Kiplimo's human world record of 57:20 by nearly seven minutes. Over 100 humanoid robots from 76 institutions competed, with Honor teams sweeping the top three spots, all using fully autonomous navigation. Last year's inaugural race saw the fastest robot finish in 2 hours 40 minutes, making this year's improvement staggering. The winning robot featured 90-95cm legs designed to mimic elite human biomechanics and liquid-cooling technology borrowed from smartphones.

Why it matters: A 3x performance improvement in one year is the kind of exponential curve that makes robotics investors salivate and labor economists nervous. This isn't a lab demo; it's a public race with 112 teams, signaling that bipedal locomotion has crossed from research curiosity to engineering competition.

WSJ: To Beat China, Embrace Open-Source AI

The Wall Street Journal published an editorial arguing the US should embrace open-source AI to maintain its competitive edge against China. The piece arrives at a pivotal moment: Meta shipped Muse Spark on April 8 as its first closed-source model, abandoning its long-standing open-weight strategy just as Chinese labs have embraced openness as an industrial strategy. An MIT study now shows China's share of global open AI model downloads exceeds that of the US, and Bloomberg reported on why China can't quit open AI.

Why it matters: The open-source AI debate has become a national security argument. With Meta defecting to closed-source and Chinese labs filling the vacuum, the US risks ceding the open ecosystem that drives adoption, talent development, and interoperability across allied nations.

Data Center Backlash Escalates to Violence

The nationwide pushback against AI data centers has turned dangerous. Fortune reports that a 20-year-old allegedly threw a Molotov cocktail at Sam Altman's San Francisco home on April 11, followed by a separate shooting at the residence two days later. An Indianapolis councilman's home was shot at 13 times with a note reading "no data centers." At least 70 communities have imposed restrictions or rejected projects since 2021, with $18 billion in projects blocked and $46 billion delayed. Communities cite utility bill increases of up to 267% near facilities.

Why it matters: This is no longer a zoning dispute. When opposition moves from town halls to firearms, the industry faces a legitimacy crisis that no amount of "jobs and tax revenue" messaging can solve. The infrastructure buildout that every AI company depends on may face serious political headwinds.


Reddit Community Highlights

The mood across AI subreddits this weekend is practical and slightly skeptical. r/LocalLLaMA is deep in the weeds of actually running Qwen 3.6 and Gemma 4 as daily drivers, and the results are more sobering than the benchmarks suggest. r/ClaudeAI is experiencing a Opus 4.7 backlash wave while simultaneously marveling at Claude Design. Meanwhile, r/accelerate is grappling with the physical-world implications of AI, from robots outrunning humans to violence against data centers.

r/LocalLLaMA

Switching from Opus 4.7 to Qwen-35B-A3B A growing number of users are seriously evaluating Qwen3.6-35B-A3B as a replacement for their Claude Opus 4.7 API subscriptions for daily coding work. The thread reveals a community split: some report near-parity on straightforward coding tasks, while others note Opus still holds an edge on complex multi-step reasoning. The discussion highlights how quickly local models have moved from novelty to genuine contenders for professional workflows.

Reddit thread: Switching from Opus 4.7 to Qwen-35B-A3B

llama.cpp Speculative Checkpointing Was Merged The community is testing the newly merged speculative checkpointing feature (PR #19493), which enables speculative decoding to work with hybrid architectures. Early reports show task-dependent results: coding workloads see 0-50% speedups with ngram-based speculation, while other prompts show minimal gains. The thread includes detailed parameter tuning discussion that's valuable for anyone running local inference.

Reddit thread: llama.cpp speculative checkpointing was merged

"Browser OS" Implemented by Qwen 3.6 35B A user showcased what they called the best result they've ever gotten from a local model: Qwen 3.6 35B generating a complete "Browser OS" implementation. The post demonstrates the model's ability to handle ambitious, multi-component coding tasks that would have been unthinkable for local models even six months ago.

Reddit thread: "Browser OS" implemented by Qwen 3.6 35B: The best result I ever got from a local model

r/ClaudeAI

Enterprise Admins Can See All Messages, Including "Incognito" Chats A highly discussed post warns that Claude Enterprise's Compliance API gives admins full access to all conversations, including those in incognito mode. The feature takes about 5 minutes to enable and pulls complete chat history. The thread sparked debate about whether this is standard enterprise tooling or an unexpected privacy gap, with many users surprised that "incognito" doesn't mean what they assumed.

Reddit thread: YSK: If you use Claude on your company's Enterprise plan, your employer can access every message you've ever sent, including "incognito" chats

If You're Unsatisfied with Opus 4.7, Switch to 4.6 The Opus 4.7 backlash continues, with users reporting it as the first time they've voluntarily downgraded to a previous model version. The post pleads with the community to stop flooding the subreddit with complaints and simply switch back to 4.6, which remains available. The sentiment suggests Anthropic's latest model may have optimized for benchmarks at the expense of the subjective "feel" that made Claude distinctive.

Reddit thread: If you are unsatisfied with Opus 4.7, PLEASE simply switch to 4.6

The Technical vs Non-Technical AI Gap Is Huge A thoughtful post observes that non-technical users still treat LLMs as better search engines, unaware of features like thinking effort selection or model choice. The author argues the gap in value extracted from AI tools is widening dramatically, with technical users building agentic workflows while others barely scratch the surface. The thread resonated widely as a snapshot of where AI adoption actually stands.

Reddit thread: The gap between what technical and non-technical people get from AI is huge now

r/LocalLLM

Gemma 4 vs Qwen 3.5/3.6 on Localhost A user reports that Gemma 4 outperforms Qwen 3.5/3.6 models on their local setup, specifically noting that none of the Qwen models up to 122B could fix a bug introduced by the 122B model itself. The post highlights a practical reality: benchmark rankings don't always predict which model will handle your specific debugging task. Gemma 4's multimodal architecture and efficient MoE design are winning converts.

Reddit thread: For me Gemma4 > Qwen3.5 / 3.6 on localhost

Gemma 4 26B on a 5090: Deployment Breakdown A detailed technical post shares benchmarks running Gemma 4 26B on an RTX 5090 via vLLM: ~196 tok/s decode speed, 96K context, and 1-3s warm TTFT. The author discusses quant format tradeoffs on Blackwell architecture and notes the 5090 at $0.86/hr offers roughly one-third the cost of an H100 for this workload. This is the kind of concrete deployment data the community craves.

Reddit thread: Anyone else testing Gemma 4 26B on a 5090? Here is my deployment and optimization breakdown.

r/huggingface

Distilled Opus 4.7 Reasoning into Open 35B MoE The post highlights the growing trend of distilling frontier model reasoning into open-weight models, with a developer fine-tuning Qwen3.6-35B-A3B on Claude Opus 4.7's chain-of-thought traces. The community discussion centers on whether this constitutes legitimate research or ToS-violating extraction, a question that has no clean answer as frontier labs simultaneously fight distillation from Chinese competitors while their own outputs are used for the same purpose domestically.

Reddit thread: Someone distilled Claude Opus 4.7's chain-of-thought into an open 35B MoE model and it runs on a single A100

r/accelerate

Robot Breaks Human Half-Marathon Record The community celebrated Honor's humanoid robot completing the Beijing Half-Marathon in 50m26s, obliterating the human record of 57m20s. Discussion focused on the jaw-dropping year-over-year improvement (from 2h40m to sub-51 minutes) and what this exponential curve implies for physical AI capabilities. The acceleration from "can barely walk" to "faster than any human" in roughly two years has become a totem for the subreddit's thesis.

Reddit thread: 50m26s, the human half-marathon record (57m20s) was broken by a robot today

Data Center Misinformation Reaches Conspiracy-Theory Levels A post links to a detailed rebuttal of YouTuber Benn Jordan's viral video claiming data centers act as "acoustic weapons." The blog argues Jordan's infrasound claims are contradicted by decades of research and represent "highbrow misinformation." The discussion reflects a community increasingly frustrated with what they see as anti-technology narratives gaining mainstream traction despite weak evidence.

Reddit thread: We've reached conspiracy-theory levels of misinformation regarding data centres

Observations on AI Impacts at a Large R&D Institution A portfolio lead at one of the largest non-university R&D institutions in the US shared first-hand observations on AI adoption. Key takeaways: coding productivity gains are real but uneven, AI is making experienced researchers dramatically more productive while providing less lift for juniors, and organizational resistance remains the primary bottleneck rather than technical capability. The post resonated as a rare grounded perspective from inside a major institution.

Reddit thread: Observations on AI Impacts (so far) at a Large R&D Institution

r/unsloth

KV Cache Offloading and Integration Challenges Both top posts in r/unsloth this cycle reflect users hitting practical walls: one asking about offloading KV cache to system RAM on an RTX 3090 with 64GB system memory, the other struggling with integrating Unsloth Studio with Claude Code on Windows. The pattern suggests Unsloth's user base is growing beyond power users into a broader audience that needs better documentation and out-of-the-box workflows.