AI digest: Infrastructure gets serious

This week’s standout theme is infrastructure maturing fast. We’re seeing real solutions to the practical problems holding AI back.

Together AI cracks the memory problem with 2-bit quantisation

Together AI’s OSCAR uses attention-aware compression to squeeze KV caches down to 2.28 bits per element whilst keeping accuracy decent. This matters because memory bandwidth is the real bottleneck for long-context models, not compute. Smart move making it open source.

WorkOS tackles the agent authentication mess

The new auth.md protocol lets AI agents register with web apps properly instead of the current bodge of humans filling forms. It’s just a Markdown file that apps publish to tell agents which OAuth flows they support. Simple but essential as agents move into real workflows.

Microsoft’s web agent ditches clicking for code

Webwright generates reusable Playwright scripts instead of clicking around like a human. Hit 60.1% on the Odysseys benchmark, up from 33.5% with basic GPT-5.4. The terminal-native approach feels right, even if 1,000 lines of code sounds optimistic for real websites.

The intelligence argument heats up

Hassabis reckons we’re “in the foothills of the singularity” whilst LeCun says current AI isn’t actually intelligent. Meanwhile, George Hotz warns coding agents will be “one of the most costly mistakes” because they create subtle bugs. The industry’s still working out what these systems can actually do reliably.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Together AI cracks the memory problem with 2-bit quantisation

WorkOS tackles the agent authentication mess

Microsoft’s web agent ditches clicking for code

The intelligence argument heats up

Related