
Starred repositories
Extended pickling support for Python objects
Header-only C++/python library for fast approximate nearest neighbors
Apache Spark - A unified analytics engine for large-scale data processing
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
An autoregressive character-level language model for making more things
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe…
Fully open reproduction of DeepSeek-R1
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
SGLang is a fast serving framework for large language models and vision language models.
aider is AI pair programming in your terminal
Apache Beam is a unified programming model for Batch and Streaming data processing.
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.
Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Knowhere is a vector search engine, integrating FAISS, HNSW, DiskANN.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
This is a repo with links to everything you'd ever want to learn about data engineering
eBPF Observability - Distributed Tracing and Profiling
cuVS - a library for vector search and clustering on the GPU
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Development repository for the Triton language and compiler
Computational geometry and spatial indexing on the sphere
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.