AI digest: Memory breakthroughs and agent advances

Big week for making AI models faster and smarter with less compute.

Google’s TurboQuant cuts AI memory usage by 6x

Google’s TurboQuant compression algorithm shrinks the key-value cache that bottlenecks long-context inference, delivering up to 8x speedup with zero accuracy loss. The internet’s calling it “Pied Piper” for obvious reasons, but this actually solves a real problem. Memory bandwidth between HBM and SRAM is becoming the limiting factor for scaling models, so compression that works without hurting performance is genuinely useful.

AI2 releases fully open web agent that uses only screenshots

MolmoWeb from AI2 navigates websites using screenshots instead of parsing HTML or DOM. Despite being just 4B and 8B parameters, it beats larger proprietary systems on benchmarks. This is clever because it mirrors how humans actually see websites, and being fully open means we can actually understand how it works. Much more interesting than another closed API.

NVIDIA’s PivotRL trains agents 4x faster

NVIDIA’s PivotRL framework achieves high accuracy with 4x fewer rollout turns for training long-horizon agents. This tackles the efficiency problem with reinforcement learning for complex tasks like software engineering and web browsing. Smart approach that bridges the gap between cheap supervised fine-tuning and expensive end-to-end RL.

OpenAI reportedly finishes training next major model

Sam Altman is apparently telling people internally about a “very strong” model codenamed “Spud” that can “really accelerate the economy”. Standard Altman hyperbole or genuine breakthrough? We’ll find out soon enough, but the timing suggests they’re gearing up for something big after the Sora setback.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Google’s TurboQuant cuts AI memory usage by 6x

AI2 releases fully open web agent that uses only screenshots

NVIDIA’s PivotRL trains agents 4x faster

OpenAI reportedly finishes training next major model

Related