Pulse · hkust-nlp/simpleRL-reason · GitHub

January 7, 2025 – February 7, 2025

Overview

1 Active pull request

29 Active issues
- 1 Merged pull request
- 0 Open pull requests
- 7 Closed issues
- 22 New issues

1 Pull request merged by 1 person

Fix the evaluation problem
#29 merged Feb 7, 2025

7 Issues closed by 6 people

The speed of training in one node
#20 closed Feb 5, 2025
training steps in wandb
#24 closed Feb 4, 2025
Setups for returns
#23 closed Feb 2, 2025
A strange error when saving model
#19 closed Jan 31, 2025
training data from MATH
#7 closed Jan 29, 2025
File not Found Error in ray job submit
#14 closed Jan 28, 2025
I encountered a hang issue when loading the critic model.
#6 closed Jan 27, 2025

22 Issues opened by 22 people

Speed of 4 node training
#30 opened Feb 7, 2025
Does rStar-Math use RL data?
#28 opened Feb 6, 2025
can it support Qwen2.5-Math-1.5B model?
#27 opened Feb 6, 2025
Checkpoint Release
#26 opened Feb 5, 2025
worker process died
#25 opened Feb 3, 2025
Why a critic model is needed?
#22 opened Feb 1, 2025
Have you tried to train the model by Lora? It requires less computation resources.
#21 opened Jan 31, 2025
Why use PPO instead of GRPO?
#18 opened Jan 30, 2025
RuntimeError: Connection closed by peer when training on a single node
#17 opened Jan 29, 2025
Where is the long CoT data and preprocess script?
#16 opened Jan 28, 2025
Docker Container for Reproducing
#15 opened Jan 28, 2025
what is used for critic model
#13 opened Jan 27, 2025
rStar-Math-7B row in the first table seems to be inaccurate?
#12 opened Jan 27, 2025
PoT style response?
#11 opened Jan 27, 2025
Could you point to me how do you preprocess the chat message?
#10 opened Jan 27, 2025
Please give a repro on ray cluster setup?
#9 opened Jan 26, 2025
Help reproducing training run
#8 opened Jan 26, 2025
Thanks for your fast movement and open source spirit
#5 opened Jan 26, 2025
Is there a response increase compared to initial length after RL?
#4 opened Jan 26, 2025
the minimum hardware resource configuration required for training?
#3 opened Jan 26, 2025
What is the reward?
#2 opened Jan 26, 2025
A stupid question: Why researchers call such methods as "scaling test-time computation"?
#1 opened Jan 26, 2025