Evaluation reports Pass@1 accuracy (mean ± std) across six math benchmarks using standardized evaluation. The scores are across 10 seeds for AIME24, AIME25, and AMC23; and across 3 seeds for MATH500, Minerva and OlympiadBench.
Model Name | Organization | Based on | Paper Link | AIME'24 | AIME'25 | AMC'23 | MATH500 | Minerva | Olympiad | Average |
---|