-
Notifications
You must be signed in to change notification settings - Fork 166
Insights: hkust-nlp/simpleRL-reason
Overview
-
- 1 Merged pull request
- 0 Open pull requests
- 7 Closed issues
- 22 New issues
Could not load contribution data
Please try again later
1 Pull request merged by 1 person
-
Fix the evaluation problem
#29 merged
Feb 7, 2025
7 Issues closed by 6 people
-
The speed of training in one node
#20 closed
Feb 5, 2025 -
training steps in wandb
#24 closed
Feb 4, 2025 -
Setups for returns
#23 closed
Feb 2, 2025 -
A strange error when saving model
#19 closed
Jan 31, 2025 -
training data from MATH
#7 closed
Jan 29, 2025 -
File not Found Error in ray job submit
#14 closed
Jan 28, 2025 -
I encountered a hang issue when loading the critic model.
#6 closed
Jan 27, 2025
22 Issues opened by 22 people
-
Speed of 4 node training
#30 opened
Feb 7, 2025 -
Does rStar-Math use RL data?
#28 opened
Feb 6, 2025 -
can it support Qwen2.5-Math-1.5B model?
#27 opened
Feb 6, 2025 -
Checkpoint Release
#26 opened
Feb 5, 2025 -
worker process died
#25 opened
Feb 3, 2025 -
Why a critic model is needed?
#22 opened
Feb 1, 2025 -
Have you tried to train the model by Lora? It requires less computation resources.
#21 opened
Jan 31, 2025 -
Why use PPO instead of GRPO?
#18 opened
Jan 30, 2025 -
RuntimeError: Connection closed by peer when training on a single node
#17 opened
Jan 29, 2025 -
Where is the long CoT data and preprocess script?
#16 opened
Jan 28, 2025 -
Docker Container for Reproducing
#15 opened
Jan 28, 2025 -
what is used for critic model
#13 opened
Jan 27, 2025 -
rStar-Math-7B row in the first table seems to be inaccurate?
#12 opened
Jan 27, 2025 -
PoT style response?
#11 opened
Jan 27, 2025 -
Could you point to me how do you preprocess the chat message?
#10 opened
Jan 27, 2025 -
Please give a repro on ray cluster setup?
#9 opened
Jan 26, 2025 -
Help reproducing training run
#8 opened
Jan 26, 2025 -
Thanks for your fast movement and open source spirit
#5 opened
Jan 26, 2025 -
Is there a response increase compared to initial length after RL?
#4 opened
Jan 26, 2025 -
the minimum hardware resource configuration required for training?
#3 opened
Jan 26, 2025 -
What is the reward?
#2 opened
Jan 26, 2025 -
A stupid question: Why researchers call such methods as "scaling test-time computation"?
#1 opened
Jan 26, 2025