1 post tagged knowledge-distillation.
We're spending months teaching small models to mimic ensemble behaviour instead of just building better single models from the start.