News & Updates

AI digest: agents getting smarter, models getting smaller

Perplexity's self-improving agent memory, a tiny reasoning model punching above its weight, Anthropic's government ban saga, and Cisco's automated prompt optimiser.

A busy week for agent infrastructure and model efficiency. Here are the stories worth paying attention to.

Perplexity Brain teaches agents to learn from their own mistakes

Perplexity has launched Brain, a memory system for its Computer agent that tracks what worked, what failed, and what corrections got made. It builds a context graph of the agent’s past work and reviews it overnight. This is the right direction for agent development: not just doing tasks, but getting better at them without human babysitting.

VibeThinker-3B matches frontier models at a fraction of the size

VibeThinker-3B is a 3 billion parameter reasoning model built on Qwen2.5-Coder that reportedly matches DeepSeek V3.2 and Kimi K2.5 on verifiable benchmarks. It is MIT-licensed and built with the Spectrum-to-Signal post-training pipeline. A 3B model competing with much larger ones is genuinely impressive, and the open licence makes it worth experimenting with.

Cisco open-sources FAPO, an automated prompt optimiser for multi-step pipelines

FAPO runs on Claude Code and automatically improves prompts across a full LLM pipeline, attributing failures at the step level rather than blaming the whole chain. It beat GEPA on 15 of 18 model-benchmark comparisons in Cisco’s own tests. Prompt optimisation tooling has been messy and manual until now, so a structured, automated approach here is useful.

The US government banned Anthropic’s Fable 5, and it might have backfired

The US government forced Anthropic to pull Fable 5 and Mythos 5 over national security concerns after Amazon researchers found a guardrail bypass. Cybersecurity researchers signed an open letter calling the ban counterproductive, and Anthropic pointed out the same jailbreaks exist in other models. The ban looks more like precedent-setting than genuine risk management, and the streisand effect appears to be doing Anthropic’s marketing for it.

Related