Pulse · hkust-nlp/simpleRL-reason · GitHub

January 31, 2025 – February 7, 2025

Overview

1 Active pull request

10 Active issues
- 1 Merged pull request
- 0 Open pull requests
- 3 Closed issues
- 7 New issues

1 Pull request merged by 1 person

Fix the evaluation problem
#29 merged Feb 7, 2025

3 Issues closed by 3 people

The speed of training in one node
#20 closed Feb 5, 2025
training steps in wandb
#24 closed Feb 4, 2025
Setups for returns
#23 closed Feb 2, 2025

7 Issues opened by 7 people

Speed of 4 node training
#30 opened Feb 7, 2025
Does rStar-Math use RL data?
#28 opened Feb 6, 2025
can it support Qwen2.5-Math-1.5B model?
#27 opened Feb 6, 2025
Checkpoint Release
#26 opened Feb 5, 2025
worker process died
#25 opened Feb 3, 2025
Why a critic model is needed?
#22 opened Feb 1, 2025
Have you tried to train the model by Lora? It requires less computation resources.
#21 opened Jan 31, 2025

2 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Why use PPO instead of GRPO?
#18 commented on Feb 2, 2025 • 0 new comments
What is the reward?
#2 commented on Feb 4, 2025 • 0 new comments