#inference-optimisation
2 posts tagged inference-optimisation.
Thoughts
Inference harnesses just turned model improvement into a configuration management problem
We're spending billions training better models when the real gains come from better prompting infrastructure.
Speculative decoding just turned inference into a confidence game
Every speedup technique is just teaching models to guess better, and we're running out of good guesses.