AI digest: Agents go autonomous

Autonomous AI agents are moving from hype to hardware. This week brought practical tools and sobering reality checks.

Karpathy drops autoresearch for solo GPU experiments

Andrej Karpathy open-sourced autoresearch, a 630-line Python tool that lets AI agents run their own machine learning experiments on single GPUs. It’s a stripped-down version of his nanochat training core, designed for autonomous iteration. This feels like the kind of tool that’ll quietly spawn dozens of interesting projects over the next few months.

Claude finds 100+ Firefox bugs humans missed

Anthropic’s Claude AI uncovered over 100 security vulnerabilities in Firefox, including issues that decades of human testing hadn’t caught. Mozilla’s partnership with Anthropic shows AI security auditing is moving beyond proof-of-concept into proper production use. The fact that Claude spotted vulnerabilities human experts missed suggests we’re hitting a genuine capability threshold here.

AI agent benchmarks miss 92% of actual work

Researchers found that AI agent benchmarks obsess over coding whilst ignoring 92% of the US labour market. Most agent development focuses on programming tasks, leaving vast swathes of real-world work unmeasured. This explains why agent demos look impressive but feel disconnected from what most people actually do for work.

Feature bloat breaks models in production

A new study shows that excessive features create production fragility in regression models, even when accuracy improves. Every additional feature creates another dependency that can fail. Worth remembering as we pile more data into AI systems and wonder why they break in unexpected ways.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Karpathy drops autoresearch for solo GPU experiments

Claude finds 100+ Firefox bugs humans missed

AI agent benchmarks miss 92% of actual work

Feature bloat breaks models in production

Related