1 post tagged inference-scaling.
Google's TurboQuant and the rush to compress KV caches are treating symptoms whilst ignoring the real problem.