See project explanation here.
Wandb projecthere.
🚩 Uncertainty Marking | 📝 Progressive Summarization |
---|---|
Flag ambiguous steps for verification | Maintain intermediate conclusions |
✅ Self Verification | 🌐 Multilingual Switching |
---|---|
First verify then answer | Chinese reasoning traces with English answers |
Model Output Example | Average Output Length |
conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e . # For verl integration
pip install wandb IPython matplotlib
You can directly use /data.
For your own data generation, here's a demo:
python ./examples/data_preprocess/kk.py \
--local_dir {processed_data_path} \
--data_path {raw_data_path}
python ./examples/data_preprocess/kk.py \
--template_type=qwen-instruct \
--local_dir {processed_data_path} \
--data_path {raw_data_path}
conda activate logic
bash main_grpo.sh # 4×A100 80G
Component | Location |
---|---|
🏆 Reward Modeling | verl/utils/reward_score/kk.py |
📚 Data Preprocessing | examples/data_preprocess/kk.py |
@misc{logic-rl,
author = {Tian Xie and Qingnan Ren and Yuqian Hong},
title = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note = {Accessed: 2025-02-03},
year = {2025}
}