#testing
5 posts tagged testing.
Thoughts
Agent benchmarks are just unit tests for unpredictable systems
We're measuring agent performance like it's deterministic software when the whole point is emergent behaviour.
Testing frameworks just became AI's weakest link
While everyone obsesses over model capabilities, we're shipping AI systems with testing practices from 2015.
Production deployment is the new frontier research
While researchers obsess over benchmarks, the real breakthroughs are happening in production environments where models meet reality.
Tools & Experiments
Google Auto-Diagnose
An LLM-powered system that automatically reads integration test failure logs and identifies the actual bugs amongst thousands of lines of output.
Auto-Diagnose
Google's LLM-powered system that automatically reads integration test failure logs and pinpoints the actual bugs.