1 post tagged training-techniques.
Teaching smaller models to mimic larger ones isn't optimisation, it's just copying homework with extra steps.