Starred repositories
MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
A multimodal chat interface with many tools.
Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
MINT-1T: A one trillion token multimodal interleaved dataset.
Fast Hadamard transform in CUDA, with a PyTorch interface
PyTorch native quantization and sparsity for training and inference
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
vinchu / xiaohongshu-3
Forked from hkcityu/xiaohongshu小红书API,获取小红书帖子内容,评论内容等信息
MoreAPI是抖音/lemon8/小红书/快手等各视频平台非官方的RESTful API平台。抖音视频解析、小红书解析、快手解析、youtube解析、B站解析
The report of a fine-tuned GPT model unifying tables, natural language, and commands.
Robust recipes to align language models with human and AI preferences
Everything about the SmolLM2 and SmolVLM family of models
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
Neural Network Compression Framework for enhanced OpenVINO™ inference
The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Prompt Learning (WACV 2024)
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
Code for our ICML'24 on multimodal dataset distillation
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.