Workshop Log

Testing Kimi k1.5: the reasoning model nobody's talking about

kimi reasoning benchmarks moonshot-ai

Moonshot AI's Kimi k1.5 quietly dropped and it's genuinely impressive on long reasoning tasks.

Moonshot AI released Kimi k1.5 and it flew under the radar. It shouldn’t have. On multi-step reasoning tasks it’s competing with models twice its parameter count.

What we tested

Gave it a few things that trip up most models: multi-hop logic puzzles, long-context code analysis, and a tricky maths proof that Claude and GPT-4 both fumble occasionally.

Kimi handled the logic puzzles cleanly. The code analysis was solid on context windows up to about 100k tokens. The maths proof it got 80% right, which is better than most open-source alternatives.

The catch

It’s slower than you’d want. Inference times are noticeably longer than Claude or GPT-4, especially on longer prompts. The API documentation is also rough if you don’t read Mandarin.

Worth watching

This is a team that’s iterating fast. k1.5 is a big step up from their earlier models. If they sort out the speed issue, this becomes a serious contender for reasoning-heavy workloads.