#benchmarks
3 posts tagged benchmarks.
News & Updates
DeepSeek R1 vs Claude 3.5: a head-to-head on real tasks
Ran both models through the same set of coding and reasoning tasks. Results were closer than expected.
Testing Kimi k1.5: the reasoning model nobody's talking about
Moonshot AI's Kimi k1.5 quietly dropped and it's genuinely impressive on long reasoning tasks.