LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
another benchmark eclipsed by swe-bench
In this work, we introduce LiveCodeBench Pro, a rigorously curated and contamination-free bench-mark designed to evaluate the true algorithmic reasoning capabilities of LLMs in competitive pro-gramming.
Right but have they seen what excel can do? Regardless, average cost per problem analysis I think is premature. And the distinction of ‘reasoning’ and non-reasoning AIs also feels like a lost cause.
My prediction: Convergence in the future will be on price per token / problem solve rate.
Quote Citation: arxiv, “LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?”, 2025-06-13, https://arxiv.org/pdf/2506.11928
