zanghyu

Follow

eaves zanghyu

Follow

PhD student, Reinforcement Learning, Beijing Institute of Technology

13 followers · 29 following

Achievements

Achievements

Stars

fchollet / ARC-AGI

The Abstraction and Reasoning Corpus

JavaScript 3,607 596 Updated Aug 4, 2024

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Python 157 8 Updated Jan 14, 2024

openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 137 7 Updated Nov 27, 2024

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 867 37 Updated Nov 16, 2024

idrori / stemQ

Data for the paper "A Dataset for Learning University STEM Courses at Scale" by Zhang et al., 2022.

11 7 Updated Nov 22, 2022

volcengine / verl

veRL: Volcano Engine Reinforcement Learning for LLM

Python 361 25 Updated Dec 4, 2024

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

Python 2,849 270 Updated Dec 4, 2024

MARIO-Math-Reasoning / Super_MARIO

Python 264 18 Updated Nov 29, 2024

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,157 87 Updated Dec 3, 2024

femto / minion

👷‍♂️Minion is designed to execute any type of queries, offering a variety of features that demonstrate its flexibility and intelligence.

Python 8 Updated Nov 21, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey: A Strategic Progress Report – Part I

1,591 46 Updated Nov 30, 2024

idavidrein / gpqa

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Jupyter Notebook 191 11 Updated Sep 30, 2024

mattian7 / CoT-Papers-Note

Forked from Timothyxxx/Chain-of-ThoughtsPapers

A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".

40 3 Updated Jun 9, 2023

codelion / optillm

Optimizing inference proxy for LLMs

Python 1,688 132 Updated Nov 29, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 37,700 6,017 Updated Aug 19, 2024

seucoin / unKR

A Python Library for Uncertain Knowledge Graph Reasoning

Python 30 4 Updated Nov 13, 2024

expz / quiet-star

Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)

Python 44 2 Updated Aug 8, 2024

ezelikman / quiet-star

Code for Quiet-STaR

Python 664 89 Updated Aug 21, 2024

lucidrains / self-reasoning-tokens-pytorch

Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto

Python 53 4 Updated May 17, 2024

neurallambda / awesome-reasoning

a curated list of data for reasoning ai

115 5 Updated Aug 4, 2024

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,689 104 Updated Jun 1, 2023

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,074 152 Updated Dec 3, 2024

reasoning-survey / Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

463 49 Updated Jul 10, 2024

TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Python 31 5 Updated Jul 28, 2024

alibaba / ChatLearn

A flexible and efficient training framework for large-scale alignment tasks

Python 225 17 Updated Dec 4, 2024

sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML 309 34 Updated Dec 26, 2023

jaehunjung1 / impossible-distillation

Python 16 Updated Jul 3, 2024

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python 2,254 165 Updated Nov 13, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 7,131 1,917 Updated Dec 4, 2024

Mihir3009 / LogicBench

LogicBench is a natural language question-answering dataset consisting of 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics.

16 3 Updated May 2, 2024