(Thank you for publishing the insights)
Shouldn't it be taken from table 6 in the rStar-Math paper, i.e. rStar-Math round 4 (the one with system 2 MCTS), where it has a performance of 89.4 in the MATH benchmark? I think you are using the result from table 7, where it is a SFT of Qwen2.5-Math-7B fine tuned with dataset generated by rStar-Math.
(Thank you for publishing the insights)
Shouldn't it be taken from table 6 in the rStar-Math paper, i.e. rStar-Math round 4 (the one with system 2 MCTS), where it has a performance of 89.4 in the MATH benchmark? I think you are using the result from table 7, where it is a SFT of Qwen2.5-Math-7B fine tuned with dataset generated by rStar-Math.