AI digest: Voice gets real, agents get organised

Voice tech is having a moment. Two new models show the industry moving past robotic speech towards something that might actually sound human.

Inworld’s voice model listens to how you speak

Inworld AI launched Realtime TTS-2, a voice model that conditions on full audio context rather than just text transcripts. This is clever architecture. Most TTS systems read words but miss the music of conversation - the pauses, emphasis, and rhythm that make speech feel natural.

Mistral tackles the expressivity problem

Mistral’s Voxtral TTS combines autoregressive and flow-matching architectures to solve voice cloning’s biggest weakness - maintaining emotional consistency. The hybrid approach is promising, though we’ll need to hear it in action to judge whether it actually closes that expressivity gap.

OpenAI ships GPT-5.5 Instant with fewer hallucinations

GPT-5.5 Instant is now ChatGPT’s default model, promising 52.5% fewer hallucinations on sensitive topics like medicine and law. OpenAI also added “memory sources” so you can see what stored context influences responses. Smart move - transparency builds trust, especially when the model is making claims about your health.

Agent frameworks go modular

This Python tutorial shows how to build modular agent systems with dynamic tool routing. The OS-like approach to agent capabilities makes sense - reusable skills with proper schemas and central registries. We’re seeing patterns emerge for how to structure these systems properly.

Google adds webhooks to Gemini API

Google’s new webhook system eliminates polling for long-running jobs like batch processing and video generation. About time - event-driven architectures are table stakes for production AI workflows.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Inworld’s voice model listens to how you speak

Mistral tackles the expressivity problem

OpenAI ships GPT-5.5 Instant with fewer hallucinations

Agent frameworks go modular

Google adds webhooks to Gemini API

Related