AI digest: Models stumble while agents advance
Big tech models hit roadblocks whilst AI agents push into professional research and real-world applications.
The big players are struggling to keep up whilst specialised AI agents are quietly solving harder problems.
DeepMind’s Aletheia bridges competition maths to real research
Google DeepMind launched Aletheia, an AI agent designed to move beyond competition-level maths into proper research. While models hit gold medal standards at olympiads, research requires navigating vast literature and constructing long-horizon proofs. This feels like the first serious attempt to automate the boring bits of academic research whilst keeping the creative parts human.
Musk admits xAI “wasn’t built right” and starts over
Elon Musk’s xAI is undergoing a complete restructuring after he admitted the company “was not built right first time around”. The AI lab is revamping its coding tool efforts and brought in two executives from Cursor. Given Grok’s track record, this might be the most honest thing Musk has said about an AI project.
Meta delays Avocado model after falling behind rivals
Meta is postponing its next AI model “Avocado” because internal tests show it can’t compete with Google and OpenAI’s latest offerings. This suggests the AI race isn’t just about who ships first anymore. Quality gaps are becoming harder to ignore, even for a company with Meta’s resources.
AI chatbot safety concerns escalate to mass casualty cases
A lawyer handling AI psychosis cases is warning that chatbots linked to suicides are now appearing in mass casualty incidents too. The technology is moving faster than safety measures can keep up. This isn’t just about edge cases anymore when the legal system is getting involved at scale.