1 post tagged model-distillation.
Teaching smaller models to mimic larger ones isn't optimisation, it's just copying homework with extra steps.