Logic Rl

🎉 Successfully reproduced DeepSeek R1 Zero on 2K Logic Puzzle Dataset.

📢 Our detailed technical report is coming soon! Stay tuned!

See project explanation : here.

Wandb project : here.

Enhanced Features (After Rule-Based RL)

🚩 Uncertainty Marking	📝 Progressive Summarization	✅ Self Verification	🌐 Multilingual Switching
Flag ambiguous steps for verification	Maintain intermediate conclusions	First verify then answer	Chinese reasoning traces with English answers

📸 Results Preview


Model Output Example	Average Output Length

Benchmark

Model	2ppl	3ppl	4ppl	5ppl	6ppl	7ppl	8ppl
o1-2024-12-17	0.83	0.51	0.38	0.38	0.35	0.30	0.20
GPT-4o	0.68	0.57	0.49	0.32	0.23	0.21	0.11
Deepseek-Math-7b	0.35	0.21	0.08	0.06	0.02	0.00	0.00
Qwen2.5-7B-Instruct-1M	0.49	0.40	0.25	0.11	0.02	0.06	0.01
Qwen2.5-7B-Logic-RL (ours)	0.68	0.59	0.44	0.34	0.22	0.16	0.15

Our model only used 2K training data with 400 training steps. More model benchmarks will be updated later this week.

🛠️ Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlib

Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Training Execution

conda activate logic
bash main_grpo.sh  # 4×A100 80G

⚙️ Implementation Details

Component	Location
Reward Modeling	`verl/utils/reward_score/kk.py`
Data Preprocessing	`examples/data_preprocess/kk.py`

Citation

@misc{logic-rl,
author       = {Tian Xie and Qingnan Ren and Yuqian Hong and Zitian Gao},
title        = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note         = {Accessed: 2025-02-03},
year         = {2025}
}

Acknowledgements

Verl 🔗
TinyZero 🔗
Knights and Knaves (K&K) puzzles dataset 🔗

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data/kk/instruct		data/kk/instruct
docker		docker
docs		docs
examples		examples
patches		patches
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
main_grpo.sh		main_grpo.sh
mean_length.png		mean_length.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
response.png		response.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logic Rl

🎉 Successfully reproduced DeepSeek R1 Zero on 2K Logic Puzzle Dataset.

📢 Our detailed technical report is coming soon! Stay tuned!

Enhanced Features (After Rule-Based RL)

📸 Results Preview

Benchmark

🛠️ Installation

Data Preparation

Base Model

Instruct Model

Training Execution

⚙️ Implementation Details

Citation

Acknowledgements

About

Releases

Packages

Contributors 5

Languages

License

Unakar/Logic-RL

Folders and files

Latest commit

History

Repository files navigation

Logic Rl

🎉 Successfully reproduced DeepSeek R1 Zero on 2K Logic Puzzle Dataset.

📢 Our detailed technical report is coming soon! Stay tuned!

Enhanced Features (After Rule-Based RL)

📸 Results Preview

Benchmark

🛠️ Installation

Data Preparation

Base Model

Instruct Model

Training Execution

⚙️ Implementation Details

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages