Thoughts

Model distillation just turned training into intellectual property laundering

Teaching smaller models to mimic larger ones isn't optimisation, it's just copying homework with extra steps.

Distillation sounds clever until you realise what’s actually happening. We’re teaching smaller models to copy the outputs of bigger ones, then pretending this is different from just stealing the work. It’s like photocopying a textbook and claiming you wrote a summary.

The copying problem everyone ignores

When xAI admits they trained Grok on OpenAI models, that’s just honest distillation. Everyone else does it too, they just don’t say it out loud. You feed your model thousands of examples from GPT-4 or Claude, tune it to match their responses, then ship it as your own work.

The maths might be different but the intent is identical. Take someone else’s intelligence, compress it down, rebrand it. We’ve built an entire industry around sophisticated plagiarism.

Training data versus training targets

There’s a difference between learning from the same sources and learning to become the same thing. Reading Wikipedia to understand language is fair game. Reading GPT-4 outputs to become GPT-4 is just copying.

But the line keeps moving. Model providers release their outputs publicly, then act surprised when competitors hoover them up for training. It’s like leaving your homework on the desk and getting annoyed when someone copies it.

The whole ecosystem runs on this weird honour system where everyone knows what’s happening but nobody wants to say it directly. Distillation just gave us academic cover for something we were going to do anyway.