Skip to content
View zhangxuemiao's full-sized avatar
  • 北京市海淀区颐和园路5号

Block or report zhangxuemiao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 157 12 Updated Jun 20, 2024

Acceptance rates for the major AI conferences

Jupyter Notebook 4,367 307 Updated Jan 24, 2025

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,684 256 Updated Feb 6, 2025

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,717 470 Updated Sep 25, 2024

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

678 114 Updated Aug 30, 2021

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3,083 100 Updated May 23, 2024

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

18,148 1,742 Updated Sep 19, 2024

Awesome-LLM: a curated list of Large Language Model

21,370 1,749 Updated Feb 2, 2025

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Python 3,726 701 Updated Jan 11, 2025

Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"

Python 74 8 Updated Apr 12, 2023

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

2,804 890 Updated Dec 10, 2024

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

976 52 Updated Nov 21, 2024

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1,481 92 Updated Jun 3, 2024

pyspark🍒🥭 is delicious,just eat it!😋😋

Python 792 213 Updated Sep 22, 2022

TruthfulQA: Measuring How Models Imitate Human Falsehoods

Jupyter Notebook 664 78 Updated Jan 16, 2025

程序员延寿指南 | A programmer's guide to live longer

30,929 2,156 Updated Jan 30, 2024

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

Python 7,366 898 Updated Feb 8, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 17,264 1,727 Updated Feb 11, 2025

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 171,377 45,033 Updated Feb 12, 2025

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 48,802 5,762 Updated Sep 18, 2024

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,499 3,237 Updated Sep 24, 2024

A collection of resources on applications of Transformers in Medical Imaging.

1,216 188 Updated Apr 18, 2024
Python 30 4 Updated Jul 10, 2023

医学影像数据集列表 『An Index for Medical Imaging Datasets』

2,844 391 Updated Aug 15, 2024

中文常用停用词表(哈工大停用词表、百度停用词表等)

4,853 2,229 Updated Jan 25, 2024

Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习

Python 13,684 3,828 Updated Feb 11, 2025

A PyTorch-based library for semi-supervised learning (NeurIPS'21)

Python 1,323 186 Updated Aug 28, 2023

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 3,056 407 Updated Jul 25, 2024
Next