Andreas Hochlehnert*1, Hardik Bhatnagar*1, Vishaal Udandarao1,2, Samuel Albanie, Ameya Prabhu1, Matthias Bethge1
1Tübingen AI Center - University of Tübingen 2University of Cambridge
Evaluation reports Pass@1 accuracy (mean ± std) across six math benchmarks using standardized evaluation. The scores are across 10 seeds for AIME24, AIME25, and AMC23; and across 3 seeds for MATH500, Minerva and OlympiadBench.
Model | Organization | Based on | Link | AIME'24 | AIME'25 | AMC'23 | MATH500 | Minerva | OlympiadBench | Average |
---|