AI digest: Military contracts and model decay

Big week for AI politics and some sobering technical findings.

OpenAI embraces military contracts as Anthropic fights Pentagon ban

OpenAI signed a deal with the Pentagon for classified AI networks just hours after Anthropic was banned from federal agencies. Anthropic got labelled a “supply chain risk” after refusing to build autonomous weapons and surveillance tools, and they’re taking it to court. The timing feels deliberate, and it’s fascinating to watch these companies take such different stances on military applications.

Even GPT-5 gets worse the longer you chat

New research shows frontier models lose up to 33% accuracy in extended conversations, including GPT-5.2 and Claude 4.6. This isn’t just about context windows, it’s about models genuinely degrading as chats go on. Anyone who’s had a long coding session with Claude will recognise this, but seeing it quantified across the latest models is sobering.

DeepCoder claims O3-mini performance at 14B parameters

A new open-source coding model called DeepCoder supposedly matches O3-mini’s performance with just 14 billion parameters. If true, this could be huge for running capable coding assistants locally. The claims need proper verification, but the trend toward smaller, more efficient models continues.

Banks test agentic AI for trade surveillance

Goldman Sachs and Deutsche Bank are piloting AI agents that reason through trading patterns in real time rather than just following preset rules. This feels like where agentic AI might actually prove its worth, handling complex pattern recognition in regulated environments where the stakes matter.

AI digest: Agents get serious 24 Apr AI digest: Agents go mainstream 23 Apr AI digest: Tools get serious 22 Apr

OpenAI embraces military contracts as Anthropic fights Pentagon ban

Even GPT-5 gets worse the longer you chat

DeepCoder claims O3-mini performance at 14B parameters

Banks test agentic AI for trade surveillance

Related