Thoughts

Real-time voice models just made chatbots obsolete

The shift to live audio processing isn't an upgrade to existing chat interfaces - it's their complete replacement.

Google’s Gemini Flash Live and Tencent’s Covo-Audio aren’t iterative improvements on voice assistants. They’re the death knell for text-based AI interaction. When you can have a proper conversation with latency measured in milliseconds, typing prompts feels like sending telegrams.

Bandwidth wins every time

Text interfaces were always a compromise. We built chatbots because processing audio was too expensive and too slow. Now that constraint is gone. Real-time voice models process continuous audio streams and respond faster than humans can think. The cognitive overhead of translating thoughts into written prompts just became unnecessary friction.

The interface is the intelligence

Voice changes everything about how we use AI. With text, you plan your prompt, edit it, submit it, wait. With voice, you think out loud. You interrupt yourself. You course-correct mid-sentence. The AI becomes a conversation partner, not a query processor. This isn’t about making chatbots talk - it’s about fundamentally different interaction patterns.

The typing tax is dead

Every company still building text-first AI interfaces is optimising for yesterday’s constraints. When users can speak naturally and get immediate audio responses, the extra step of reading and typing becomes a tax on human attention. We’re watching the same transition that killed command lines for consumer software, except faster.

Text had its run. The future sounds different.

Related