Sober Reasoning Leaderboard 🍷

Evaluation reports Pass@1 accuracy (mean ± std) across six math benchmarks using standardized evaluation. The scores are across 10 seeds for AIME24, AIME25, and AMC23; and across 3 seeds for MATH500, Minerva and OlympiadBench.

Model Name Organization Based on Paper Link AIME'24 AIME'25 AMC'23 MATH500 Minerva Olympiad Average