#scaling
2 posts tagged scaling.
Thoughts
Rollout infrastructure just became the new model training
Reinforcement learning infrastructure is eating traditional training pipelines and nobody's talking about it.
Residual connections are holding transformers back
Fixed residual mixing creates a structural bottleneck that attention-based residuals can finally solve.