Starred repositories
A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.
MLIR tools and dialect for GraphBLAS
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
A light-weight header-only library for GPU porting between CUDA and HIP
Bringup-Bench is a collection of standalone minimal library and system dependence benchmarks useful for bringing up newly designed CPUs, accelerators, compilers and operating systems. You probably …
Making Long-Context LLM Inference 10x Faster and 10x Cheaper
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
A language for constraint-guided and efficient LLM programming.
A guidance language for controlling large language models.
UTPX (Userspace Transparent Paging Extension) is a proof-of-concept LD_PRELOAD library that accelerates HIP managed allocations on systems without XNACK or with XNACK disabled.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
CLI tool for text to image generation using the FLUX.1 model.
MTEB: Massive Text Embedding Benchmark
A simple screen parsing tool towards pure vision based GUI agent
📋 A list of open LLMs available for commercial use.
A MLX port of FLUX based on the Huggingface Diffusers implementation.
Multi-platform high-performance compute language extension for Rust.
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
A modern model graph visualizer and debugger
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.