AI digest: Big tech fights get messy
Chinese labs accused of data theft, OpenAI declares war on benchmarks, and voice bots spread lies with confidence.
The gloves are coming off in AI land. Data theft accusations are flying, benchmarks are under fire, and even voice assistants can’t agree on basic facts.
Anthropic accuses Chinese labs of systematic data theft
Anthropic claims Deepseek, Moonshot, and MiniMax used 16 million queries to systematically extract Claude’s capabilities for their own models. This is the kind of accusation that makes lawyers rich and engineers nervous. If true, it shows how easy it is to distill knowledge from API calls at scale.
OpenAI wants to kill the coding benchmark everyone uses
OpenAI is pushing to retire SWE-bench Verified, claiming the popular coding benchmark is broken and models have seen the answers during training. Translation: we’ve all been measuring memorization, not actual coding ability. This feels like sour grapes, but they’re probably right about benchmark contamination being a real problem.
Voice bots happily spread false information
Researchers found that ChatGPT Voice and Gemini Live repeated false claims up to 50% of the time when asked. Meanwhile, Amazon’s Alexa refused to spread any false information. Turns out being stubborn and unhelpful might actually be a safety feature.
DeepCoder claims O3-mini performance at 14B parameters
A new open-source coding model called DeepCoder allegedly matches O3-mini performance with just 14 billion parameters. If the benchmarks hold up (see story above), this could be a real breakthrough in efficient coding models. Big if though.