-
Notifications
You must be signed in to change notification settings - Fork 166
Insights: hkust-nlp/simpleRL-reason
Overview
-
- 1 Merged pull request
- 0 Open pull requests
- 3 Closed issues
- 7 New issues
Could not load contribution data
Please try again later
1 Pull request merged by 1 person
-
Fix the evaluation problem
#29 merged
Feb 7, 2025
3 Issues closed by 3 people
-
The speed of training in one node
#20 closed
Feb 5, 2025 -
training steps in wandb
#24 closed
Feb 4, 2025 -
Setups for returns
#23 closed
Feb 2, 2025
7 Issues opened by 7 people
-
Speed of 4 node training
#30 opened
Feb 7, 2025 -
Does rStar-Math use RL data?
#28 opened
Feb 6, 2025 -
can it support Qwen2.5-Math-1.5B model?
#27 opened
Feb 6, 2025 -
Checkpoint Release
#26 opened
Feb 5, 2025 -
worker process died
#25 opened
Feb 3, 2025 -
Why a critic model is needed?
#22 opened
Feb 1, 2025 -
Have you tried to train the model by Lora? It requires less computation resources.
#21 opened
Jan 31, 2025
2 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Why use PPO instead of GRPO?
#18 commented on
Feb 2, 2025 • 0 new comments -
What is the reward?
#2 commented on
Feb 4, 2025 • 0 new comments