Scale killed the LLM star

Google’s Nano-Banana 2 does 4K image synthesis in under a second on your phone. Meanwhile, we’re still arguing about whether the next GPT needs a trillion parameters. The writing is on the wall: scale is dead, efficiency is king.

The Great Downsizing

We spent years in a dick-measuring contest over parameter counts. Bigger meant better, and better meant more zeros on the end of your model spec. But while everyone was building data centre-sized models, a quiet revolution happened. Teams figured out how to compress all that capability into something that fits in your pocket.

Nano-Banana 2 isn’t just fast. It’s embarrassingly fast for something that good. Sub-second 4K generation used to require a server farm. Now it runs on the same chip that plays your music. That’s not incremental progress, that’s a paradigm shift.

Edge cases become the main case

The real kicker? Edge deployment isn’t a nice-to-have anymore. It’s table stakes. Users don’t want to wait for round trips to the cloud. They don’t want their data leaving their device. They want AI that works on the plane, in the basement, or when the internet is down.

The companies still chasing scale are building yesterday’s solution to tomorrow’s problems. We don’t need bigger models. We need smarter ones that do more with less.

Compact models are just flagship models admitting defeat 7 Apr Local inference is just hoarding with better marketing 5 Apr On-device inference just became the only game worth playing 7 Mar

The Great Downsizing

Edge cases become the main case

Related