1 post tagged nous-research.
A training-only attention mechanism that delivers 1.4-1.7x speedup for long context pretraining.