#inference-scaling

2 posts tagged inference-scaling.

Thoughts

Embedding multiple model sizes in one checkpoint sounds clever until you realise you've just created the Git submodules of AI.

Google's TurboQuant and the rush to compress KV caches are treating symptoms whilst ignoring the real problem.