1 post tagged long-context.
A training-only attention mechanism that delivers 1.4-1.7x speedup for long context pretraining.