1 post tagged attention-mechanism.
A training-only attention mechanism that delivers 1.4-1.7x speedup for long context pretraining.