AI digest: Small models punch above their weight

This week shows the gap between model size and capability keeps shrinking, while everyone else fights over GPUs.

Zyphra’s 8B model embarrasses much larger ones

Zyphra released ZAYA1-8B, a reasoning model with only 760M active parameters that somehow outperforms models many times its size on maths and coding. It beats Claude 4.5 Sonnet on some benchmarks and gets close to DeepSeek-V3.2. The real kicker? It was trained entirely on AMD hardware and released under Apache 2.0.

Google speeds up Gemma 4 with clever drafting

Google’s new multi-token prediction drafters give Gemma 4 a 3x speedup without losing quality. A smaller model suggests multiple tokens at once, then the main model validates them in one pass. Smart approach to the inference speed problem that doesn’t require fundamentally different architectures.

Anthropic grabs SpaceX’s entire GPU cluster

Anthropic is taking over SpaceX’s Colossus-1 data centre for 220,000 NVIDIA GPUs. That’s more than 300 megawatts coming online within a month. Meanwhile, DeepSeek is reportedly raising at a $45B valuation. The compute arms race is getting expensive, which makes those efficient small models look even more appealing.

Legal hallucinations get expensive

A Latham & Watkins filing in a case against Anthropic contained Claude hallucinations. The irony is thick: a firm that represents Anthropic filed court documents with fabricated citations from their own client’s AI. This will likely accelerate professional liability discussions around AI tool usage.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Zyphra’s 8B model embarrasses much larger ones

Google speeds up Gemma 4 with clever drafting

Anthropic grabs SpaceX’s entire GPU cluster

Legal hallucinations get expensive

Related