AI digest: agents get serious upgrades

This week brought some proper advances in AI agents and models. The tooling is getting better and the benchmarks are finally shifting.

Hermes Agent fixes the MCP context mess

Nous Research shipped Tool Search for their Hermes Agent, tackling the context bloat problem in Model Control Protocol implementations. Using BM25 progressive schema disclosure, they’re seeing 49% to 74% accuracy gains on Claude Opus 4. This matters because MCP has been promising but clunky in practice, and context management is still a real bottleneck for agent workflows.

Claude Opus 4.8 overtakes GPT-5.5

Anthropic released Claude Opus 4.8, which they’re calling a “modest but tangible improvement” that beats GPT-5.5 and Gemini 3.1 Pro across most benchmarks. The model catches its own coding errors four times more often than its predecessor. Alongside this, Anthropic pulled in a massive $65 billion Series H at a $965 billion valuation, which suggests the market thinks these “modest” improvements are worth quite a lot.

Hexo Labs open-sources self-improving agents

Hexo Labs released SIA under MIT licence, which combines scaffold updates with LoRA weight modifications in a self-improving agent loop. It beat scaffold-only approaches on LawBench and other benchmarks. The interesting bit here is that it’s updating both the execution environment and the model weights simultaneously, which feels like a more complete approach to agent improvement.

StepFun ships 198B MoE with vision

StepFun’s Step 3.7 Flash is a 198B MoE vision-language model with 256k context and native vision capabilities. The “Advisor Mode” sounds promising for coding workflows, though we’ll need to see how it performs against established players.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Hermes Agent fixes the MCP context mess

Claude Opus 4.8 overtakes GPT-5.5

Hexo Labs open-sources self-improving agents

StepFun ships 198B MoE with vision

Related