another benchmark eclipsed by swe-bench

In this work, we introduce LiveCodeBench Pro, a rigorously curated and contamination-free bench-mark designed to evaluate the true algorithmic reasoning capabilities of LLMs in competitive pro-gramming.

Right but have they seen what excel can do? Regardless, average cost per problem analysis I think is premature. And the distinction of ‘reasoning’ and non-reasoning AIs also feels like a lost cause.

My prediction: Convergence in the future will be on price per token / problem solve rate.

_{Quote Citation: arxiv, “LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?”, 2025-06-13, https://arxiv.org/pdf/2506.11928}