Stars
MagicPIG: LSH Sampling for Efficient LLM Generation
MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
nanobind: tiny and efficient C++/Python bindings
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
FlashInfer: Kernel Library for LLM Serving
Machine Learning Engineering Open Book
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
collection of benchmarks to measure basic GPU capabilities
A low-latency & high-throughput serving engine for LLMs
Dynamic Memory Management for Serving LLMs without PagedAttention
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Fast and memory-efficient exact attention
SGLang is a fast serving framework for large language models and vision language models.
Deep Reinforcement Learning: Zero to Hero!
Advanced Python Mastery (course by @dabeaz)
OneDiff: An out-of-the-box acceleration library for diffusion models.
A GPU-driven system framework for scalable AI applications
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).