AI digest: Memory breakthroughs and agent advances
Google's TurboQuant compresses AI memory by 6x whilst new web agents navigate with just screenshots.
Big week for making AI models faster and smarter with less compute.
Google’s TurboQuant cuts AI memory usage by 6x
Google’s TurboQuant compression algorithm shrinks the key-value cache that bottlenecks long-context inference, delivering up to 8x speedup with zero accuracy loss. The internet’s calling it “Pied Piper” for obvious reasons, but this actually solves a real problem. Memory bandwidth between HBM and SRAM is becoming the limiting factor for scaling models, so compression that works without hurting performance is genuinely useful.
AI2 releases fully open web agent that uses only screenshots
MolmoWeb from AI2 navigates websites using screenshots instead of parsing HTML or DOM. Despite being just 4B and 8B parameters, it beats larger proprietary systems on benchmarks. This is clever because it mirrors how humans actually see websites, and being fully open means we can actually understand how it works. Much more interesting than another closed API.
NVIDIA’s PivotRL trains agents 4x faster
NVIDIA’s PivotRL framework achieves high accuracy with 4x fewer rollout turns for training long-horizon agents. This tackles the efficiency problem with reinforcement learning for complex tasks like software engineering and web browsing. Smart approach that bridges the gap between cheap supervised fine-tuning and expensive end-to-end RL.
OpenAI reportedly finishes training next major model
Sam Altman is apparently telling people internally about a “very strong” model codenamed “Spud” that can “really accelerate the economy”. Standard Altman hyperbole or genuine breakthrough? We’ll find out soon enough, but the timing suggests they’re gearing up for something big after the Sora setback.