AI digest: efficiency breakthroughs and infrastructure reality checks
Major advances in model efficiency clash with infrastructure growing pains as the AI boom hits practical limits.
This week brought impressive technical advances alongside some sobering reality checks about AI’s practical deployment.
TinyLoRA proves less can be more
Researchers from Meta, Cornell, and CMU have shown that LLMs can learn to reason with just 13 trainable parameters, hitting 91.8% on GSM8K with Qwen2.5-7B. This is genuinely impressive because it suggests we’re massively over-parameterising fine-tuning. The implications for edge deployment and training costs could be massive.
Paged attention tackles the memory wall
Paged Attention is solving LLM inference’s biggest bottleneck by treating GPU memory like virtual memory instead of reserving huge fixed blocks per request. This matters because memory, not compute, is what actually limits how many requests you can handle simultaneously. Smart systems thinking applied to the KV cache problem.
OpenAI’s Sora experiment dies quietly
OpenAI is shutting down its Sora social feed app despite the underlying video generation tech being “scarily impressive”. Turns out nobody wanted an AI-only social feed, which isn’t surprising. Good tech doesn’t automatically mean good product, and this felt like a solution looking for a problem.
Anthropic gives Claude more autonomy
Claude Code’s new auto mode lets the AI execute tasks with fewer human approvals whilst keeping safety guardrails. This reflects the ongoing tension between making AI agents genuinely useful and keeping them controllable. The “leash” metaphor in the headline feels about right.