bittersweet1999

bittersweet1999

11 followers · 1 following

Achievements

Stars

ScalerLab / JudgeBench

Python 50 2 Updated Nov 7, 2024

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Python 459 54 Updated Dec 11, 2024

open-compass / CompassJudger

78 4 Updated Nov 26, 2024

magpie-align / magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

Python 524 55 Updated Nov 5, 2024

opendatalab / labelU

Data annotation toolbox supports image, audio and video data.

Python 904 87 Updated Nov 25, 2024

opendatalab / LabelLLM

The Open-Source Data Annotation Platform

TypeScript 598 48 Updated Nov 6, 2024

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 21,193 1,507 Updated Dec 13, 2024

prometheus-eval / prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Python 815 49 Updated Nov 29, 2024

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,498 209 Updated Dec 13, 2024

mtbench101 / mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

52 24 Updated Jul 24, 2024

InternLM / HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Python 1,547 128 Updated Oct 29, 2024

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 4,064 320 Updated Nov 8, 2024

hkust-nlp / llm-compression-intelligence

Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]

Python 127 6 Updated Sep 20, 2024

EQ-bench / EQ-Bench

A benchmark for emotional intelligence in large language models

Python 206 18 Updated Jul 26, 2024

lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Jupyter Notebook 681 80 Updated Dec 14, 2024

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,562 245 Updated Nov 11, 2024

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 4,277 455 Updated Dec 13, 2024

open-compass / MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

86 1 Updated Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bittersweet1999

Achievements

Achievements

Block or report bittersweet1999

Stars

ScalerLab / JudgeBench

allenai / reward-bench

open-compass / CompassJudger

magpie-align / magpie

opendatalab / labelU

opendatalab / LabelLLM

opendatalab / MinerU

prometheus-eval / prometheus-eval

open-compass / VLMEvalKit

mtbench101 / mt-bench-101

InternLM / HuixiangDou

InternLM / xtuner

hkust-nlp / llm-compression-intelligence

EQ-bench / EQ-Bench

lmarena / arena-hard-auto

tatsu-lab / alpaca_eval

open-compass / opencompass

open-compass / MathBench