Logic Rl

Successfully reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.

See project explanation here.

Wandb projecthere.

✨ Enhanced Features (After Rule-Based RL)

🚩 Uncertainty Marking	📝 Progressive Summarization
Flag ambiguous steps for verification	Maintain intermediate conclusions

✅ Self Verification	🌐 Multilingual Switching
First verify then answer	Chinese reasoning traces with English answers

📸 Results Preview


Model Output Example	Average Output Length

🛠️ Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlib

📂 Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

🚀 Training Execution

conda activate logic
bash main_grpo.sh  # 4×A100 80G

⚙️ Implementation Details

Component	Location
🏆 Reward Modeling	`verl/utils/reward_score/kk.py`
📚 Data Preprocessing	`examples/data_preprocess/kk.py`

Citation

@misc{logic-rl,
author       = {Tian Xie and Qingnan Ren and Yuqian Hong},
title        = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note         = {Accessed: 2025-02-03},
year         = {2025}
}

🙏 Acknowledgements

Verl 🔗
TinyZero 🔗

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data/kk/instruct		data/kk/instruct
docker		docker
docs		docs
examples		examples
patches		patches
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
main_grpo.sh		main_grpo.sh
mean_length.png		mean_length.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
response.png		response.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logic Rl

Successfully reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.

✨ Enhanced Features (After Rule-Based RL)

📸 Results Preview

🛠️ Installation

📂 Data Preparation

Base Model

Instruct Model

🚀 Training Execution

⚙️ Implementation Details

Citation

🙏 Acknowledgements

About

Releases

Packages

Languages

License

zhuanglineu/Logic-RL

Folders and files

Latest commit

History

Repository files navigation

Logic Rl

Successfully reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.

✨ Enhanced Features (After Rule-Based RL)

📸 Results Preview

🛠️ Installation

📂 Data Preparation

Base Model

Instruct Model

🚀 Training Execution

⚙️ Implementation Details

Citation

🙏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages