1 post tagged speculative-decoding.
Every speedup technique is just teaching models to guess better, and we're running out of good guesses.