AI digest: efficiency breakthroughs and infrastructure reality checks

This week brought impressive technical advances alongside some sobering reality checks about AI’s practical deployment.

TinyLoRA proves less can be more

Researchers from Meta, Cornell, and CMU have shown that LLMs can learn to reason with just 13 trainable parameters, hitting 91.8% on GSM8K with Qwen2.5-7B. This is genuinely impressive because it suggests we’re massively over-parameterising fine-tuning. The implications for edge deployment and training costs could be massive.

Paged attention tackles the memory wall

Paged Attention is solving LLM inference’s biggest bottleneck by treating GPU memory like virtual memory instead of reserving huge fixed blocks per request. This matters because memory, not compute, is what actually limits how many requests you can handle simultaneously. Smart systems thinking applied to the KV cache problem.

OpenAI’s Sora experiment dies quietly

OpenAI is shutting down its Sora social feed app despite the underlying video generation tech being “scarily impressive”. Turns out nobody wanted an AI-only social feed, which isn’t surprising. Good tech doesn’t automatically mean good product, and this felt like a solution looking for a problem.

Anthropic gives Claude more autonomy

Claude Code’s new auto mode lets the AI execute tasks with fewer human approvals whilst keeping safety guardrails. This reflects the ongoing tension between making AI agents genuinely useful and keeping them controllable. The “leash” metaphor in the headline feels about right.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

TinyLoRA proves less can be more

Paged attention tackles the memory wall

OpenAI’s Sora experiment dies quietly

Anthropic gives Claude more autonomy

Related