AI digest: Big tech fights get messy

The gloves are coming off in AI land. Data theft accusations are flying, benchmarks are under fire, and even voice assistants can’t agree on basic facts.

Anthropic accuses Chinese labs of systematic data theft

Anthropic claims Deepseek, Moonshot, and MiniMax used 16 million queries to systematically extract Claude’s capabilities for their own models. This is the kind of accusation that makes lawyers rich and engineers nervous. If true, it shows how easy it is to distill knowledge from API calls at scale.

OpenAI wants to kill the coding benchmark everyone uses

OpenAI is pushing to retire SWE-bench Verified, claiming the popular coding benchmark is broken and models have seen the answers during training. Translation: we’ve all been measuring memorization, not actual coding ability. This feels like sour grapes, but they’re probably right about benchmark contamination being a real problem.

Voice bots happily spread false information

Researchers found that ChatGPT Voice and Gemini Live repeated false claims up to 50% of the time when asked. Meanwhile, Amazon’s Alexa refused to spread any false information. Turns out being stubborn and unhelpful might actually be a safety feature.

DeepCoder claims O3-mini performance at 14B parameters

A new open-source coding model called DeepCoder allegedly matches O3-mini performance with just 14 billion parameters. If the benchmarks hold up (see story above), this could be a real breakthrough in efficient coding models. Big if though.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Anthropic accuses Chinese labs of systematic data theft

OpenAI wants to kill the coding benchmark everyone uses

Voice bots happily spread false information

DeepCoder claims O3-mini performance at 14B parameters

Related