zhangxuemiao

章清忆 zhangxuemiao

3 followers · 9 following

北京市海淀区颐和园路5号

Stars

princeton-nlp / QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 157 12 Updated Jun 20, 2024

deepseek-ai / DeepSeek-V3

Python 83,337 13,309 Updated Feb 8, 2025

lixin4ever / Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Jupyter Notebook 4,367 307 Updated Jan 24, 2025

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,684 256 Updated Feb 6, 2025

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,717 470 Updated Sep 25, 2024

liuhuanyong / DomainWordsDict

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库，可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

678 114 Updated Aug 30, 2021

CLUEbenchmark / SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3,083 100 Updated May 23, 2024

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

18,148 1,742 Updated Sep 19, 2024

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

21,370 1,749 Updated Feb 2, 2025

zjunlp / DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Python 3,726 701 Updated Jan 11, 2025

nkandpa2 / long_tail_knowledge

Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"

Python 74 8 Updated Apr 12, 2023

MoRan1607 / BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,804 890 Updated Dec 10, 2024

HillZhang1999 / llm-hallucination-survey

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

976 52 Updated Nov 21, 2024

MLGroupJLU / LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1,481 92 Updated Jun 3, 2024

lyhue1991 / eat_pyspark_in_10_days

pyspark🍒🥭 is delicious，just eat it!😋😋

Python 792 213 Updated Sep 22, 2022

sylinrl / TruthfulQA

TruthfulQA: Measuring How Models Imitate Human Falsehoods

Jupyter Notebook 664 78 Updated Jan 16, 2025

google-research / deduplicate-text-datasets

Rust 1,170 115 Updated Jul 30, 2024

geekan / HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

30,929 2,156 Updated Jan 30, 2024

stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

Python 7,366 898 Updated Feb 8, 2025

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 17,264 1,727 Updated Feb 11, 2025

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 171,377 45,033 Updated Feb 12, 2025

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 48,802 5,762 Updated Sep 18, 2024

JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,499 3,237 Updated Sep 24, 2024

fahadshamshad / awesome-transformers-in-medical-imaging

A collection of resources on applications of Transformers in Medical Imaging.

1,216 188 Updated Apr 18, 2024

linhandev / dataset

医学影像数据集列表『An Index for Medical Imaging Datasets』

2,844 391 Updated Aug 15, 2024

goto456 / stopwords

中文常用停用词表（哈工大停用词表、百度停用词表等）

4,853 2,229 Updated Jan 25, 2024

jindongwang / transferlearning

Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习

Python 13,684 3,828 Updated Feb 11, 2025

TorchSSL / TorchSSL

A PyTorch-based library for semi-supervised learning (NeurIPS'21)

Python 1,323 186 Updated Aug 28, 2023

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 3,056 407 Updated Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly