Skip to content

zhuanglineu/Logic-RL

 
 

Repository files navigation

Logic Rl

Successfully reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.

See project explanation here.

Wandb projecthere.


✨ Enhanced Features (After Rule-Based RL)

🚩 Uncertainty Marking 📝 Progressive Summarization
Flag ambiguous steps for verification Maintain intermediate conclusions
✅ Self Verification 🌐 Multilingual Switching
First verify then answer Chinese reasoning traces with English answers

📸 Results Preview

Model Output Output Length
Model Output Example Average Output Length

🛠️ Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlib

📂 Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

🚀 Training Execution

conda activate logic
bash main_grpo.sh  # 4×A100 80G

⚙️ Implementation Details

Component Location
🏆 Reward Modeling verl/utils/reward_score/kk.py
📚 Data Preprocessing examples/data_preprocess/kk.py


Citation

@misc{logic-rl,
author       = {Tian Xie and Qingnan Ren and Yuqian Hong},
title        = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note         = {Accessed: 2025-02-03},
year         = {2025}
}


🙏 Acknowledgements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • Shell 2.5%