AI digest: infrastructure gets real

The honeymoon period is over. Companies are hitting AI cost walls and scrambling for solutions while mobile deployment finally gets the tools it needs.

Google pays SpaceX nearly a billion monthly for compute

Google is paying SpaceX $920 million per month for compute power. A Google rep blamed “unexpected demand” for recently launched AI products. That’s either a massive miscalculation or AI usage is growing faster than anyone anticipated.

Gemma 4 gets proper mobile optimisation

Google DeepMind released Gemma 4 QAT checkpoints with Q4_0 quantisation and a new mobile format that cuts on-device memory. Finally, someone’s taking edge deployment seriously instead of just shrinking cloud models. The mobile format is the interesting bit here.

NVIDIA tackles inference startup times

NVIDIA released Dynamo Snapshot, a CRIU-based system for fast AI inference startup on Kubernetes. It checkpoints and restores vLLM workers instead of cold starting them. Smart move when every second of downtime costs real money.

Perplexity builds hybrid inference routing

Perplexity announced a hybrid local-server inference orchestrator that automatically routes tasks between on-device and cloud models. This is the logical next step after everyone realised running everything in the cloud is expensive. Proper task routing could be a game changer for cost management.

Token costs finally hit reality

TechCrunch reports on AI’s runaway costs, with one source noting the shift from “tokenmaxxing and go fast” to “we need guardrails, how do we control this?” The party’s over and finance teams are asking awkward questions.

AI digest: Models get faster, companies get desperate 11 Jun AI digest: Speech models and code tooling hit production 10 Jun AI digest: agents get serious, speed breaks records 9 Jun

Google pays SpaceX nearly a billion monthly for compute

Gemma 4 gets proper mobile optimisation

NVIDIA tackles inference startup times

Perplexity builds hybrid inference routing

Token costs finally hit reality

Related