Stars
BAML is a language that helps you get structured data from LLMs, with the best DX possible. Works with all languages. Check out the promptfiddle.com playground
Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
High-performance In-browser LLM Inference Engine
Efficient, Flexible and Portable Structured Generation
how to optimize some algorithm in cuda.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Distributed stream processing engine in Rust
A Vector Database Tutorial (over CMU-DB's BusTub system)
Rust 程序设计语言 中文版——Chinese translation of The Rust Programming Language (Book)
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
SGLang is a fast serving framework for large language models and vision language models.
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
FlashInfer: Kernel Library for LLM Serving
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Fast and memory-efficient exact attention
[TMLR 2024] Efficient Large Language Models: A Survey
Ongoing research training transformer models at scale