Stars
RLHF implementation details of OAI's 2019 codebase
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
A bibliography and survey of the papers surrounding o1
Data for the paper "A Dataset for Learning University STEM Courses at Scale" by Zhang et al., 2022.
veRL: Volcano Engine Reinforcement Learning for LLM
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
👷♂️Minion is designed to execute any type of queries, offering a variety of features that demonstrate its flexibility and intelligence.
O1 Replication Journey: A Strategic Progress Report – Part I
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A Python Library for Uncertain Knowledge Graph Reasoning
Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
800,000 step-level correctness labels on LLM solutions to MATH problems
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
A flexible and efficient training framework for large-scale alignment tasks
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A framework for few-shot evaluation of language models.
LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.