MachinaLearning

The Week Anthropic Changed the Game — Twice

In a single briefing cycle, Anthropic unveiled a model capable enough to expose decades-old OS vulnerabilities — and chose not to release it. Simultaneously, the company crossed $30 billion in annualized revenue, surpassing OpenAI for the first time. Meanwhile, Claude Opus 4.6 topped every major benchmark, DeepSeek cut Western pricing by 70%, and the three largest US AI labs quietly began sharing intelligence. The frontier is moving fast on every axis at once.

Withheld

Anthropic Mythos Model

$30B

Anthropic ARR (Apr 2026)

65.3%

Opus 4.6 SWE-bench

97M

MCP Installs

Anthropic Unveils Mythos — Then Locks It Away

Anthropic previewed Mythos, a next-generation Claude model with a striking and unsettling capability: it can identify thousands of previously unknown vulnerabilities across major operating systems and browsers, including flaws that have lain dormant for decades. But because the same competence that finds vulnerabilities can be used to exploit them, Anthropic made the decision to keep Mythos off the market entirely — one of the most significant acts of deliberate capability withholding in the industry’s history.

The decision is a concrete instantiation of responsible deployment principles that labs have long discussed in the abstract. Mythos is not a prototype or a failure — it is a capable, finished model that Anthropic judged too dangerous to deploy. That judgment, and its public announcement, sets a precedent the rest of the industry will need to reckon with.

Why This Matters

Until now, “responsible AI” has largely meant adding safety guardrails to models before release. Mythos marks the first widely-reported case of a frontier lab concluding that no guardrail is sufficient — and that non-release is the only responsible path. If this becomes a template, it will reshape how the field thinks about dual-use capability thresholds.

Anthropic Overtakes OpenAI in Revenue

Anthropic’s annualized run-rate revenue crossed $30 billion in April 2026 — surpassing OpenAI’s $25 billion ARR and marking the first time Anthropic has led the industry’s revenue race. The milestone arrives alongside a wave of compute investment: expanded chip deals with Google and Broadcom (announced April 6–7) are cementing Anthropic’s infrastructure position as it scales to meet demand.

The revenue reversal is significant not just as a competitive data point but as a signal of enterprise confidence. Anthropic’s safety-first positioning, long viewed as a commercial handicap against more permissive competitors, appears to be resolving as an advantage in regulated industries and risk-conscious enterprise procurement.

Claude Opus 4.6 Sets New Benchmarks

Anthropic’s Claude Opus 4.6 is now the top-rated model on the LMSYS Chatbot Arena, surpassing GPT-5.4 and Gemini 3.1 Pro. On SWE-bench Verified — the agentic software engineering benchmark that measures a model’s ability to resolve real GitHub issues — Opus 4.6 hit 65.3%, a record for the task. The model’s hybrid architecture combines standard transformer layers with sparse Mixture-of-Experts routing for reasoning-intensive operations, suggesting the efficiency wins from MoE routing are maturing into production-grade systems.

Benchmark Context

LMSYS Chatbot Arena — crowd-sourced human preference ranking across thousands of blind comparisons; widely considered the most robust real-world quality signal.
SWE-bench Verified — tasks sampled from real GitHub issues requiring multi-file edits, test execution, and iterative debugging; a direct proxy for autonomous coding capability.

DeepSeek R2: Frontier Performance at Fraction of the Cost

Chinese AI lab DeepSeek released R2, its latest reasoning model, scoring 92.7% on AIME 2025 and 89.4% on MATH-500 — figures competitive with the top Western models — while pricing it approximately 70% lower than comparable US offerings. The release continues a pattern that has become a defining pressure on the industry: competitive capability at structurally lower cost, repeatedly.

For practitioners and organizations running large inference workloads, DeepSeek R2 changes the economics of what is buildable. The gap between capability and cost that once made frontier reasoning inaccessible to smaller teams is narrowing with each generation.

Infrastructure and Cooperation

Anthropic’s Model Context Protocol (MCP) crossed 97 million installsin March 2026, with every major AI provider now shipping MCP-compatible tooling. What began as an experimental Anthropic standard has matured into foundational infrastructure for the agentic era — the plumbing layer that lets agents connect to external tools, data sources, and services in a standardized way. Its transition to Linux Foundation governance underscores that MCP is now a community asset rather than a vendor play.

Separately, OpenAI, Anthropic, and Google announced on April 6–7 that they are sharing intelligence through the Frontier Model Forum to prevent Chinese AI companies from extracting frontier capability via adversarial distillation — a technique where a weaker model is trained to mimic a stronger one by observing its outputs at scale. This rare moment of direct competitive cooperation is driven by national security concerns and signals that the geopolitical dimension of AI competition has moved from background context to active operational reality.

What is Adversarial Distillation?

Knowledge distillation normally trains a smaller “student” model to replicate a larger “teacher” model’s outputs. Adversarial distillation applies the same technique without permission — feeding a model billions of queries and using its responses as training signal to produce a functionally similar model at far lower development cost. The Frontier Model Forum initiative aims to detect and disrupt this pattern before it neutralizes the compute and research investments of Western labs.

Looking Ahead

The April 12 briefing captures an industry in genuine transition. Capability is no longer the bottleneck — the question of what to do with it is. Anthropic withholding Mythos while simultaneously leading on revenue is not a contradiction; it may be the thesis that the next phase of AI competition is fought on trust and governance as much as on benchmarks.

For researchers and practitioners, the near-term priorities are clear: agentic evaluation (SWE-bench, OSWorld), MCP-native tooling, and understanding the cost curves that DeepSeek R2 is reshaping. The infrastructure decisions made in this window — which protocols to build on, which models to fine-tune, which providers to anchor to — will carry significant lock-in implications as the ecosystem continues to consolidate.

References

RTÉ — Fail Safe: Why Anthropic won’t release its new AI model (Mythos)RoboRhythms — Anthropic Annualized Revenue Hits $30B, Surpasses OpenAI CNBC — Broadcom Agrees to Expanded Chip Deals with Google, Anthropic TechCrunch — Anthropic Compute Deal: Google & Broadcom TPU Push RoboRhythms — OpenAI, Anthropic & Google Unite Against Chinese AI Copying LMSYS — Chatbot Arena: Benchmarking LLMs in the Wild