Skip to content
View yufansong's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@risingwavelabs

Block or report yufansong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 988 30 Updated Nov 21, 2024

BAML is a language that helps you get structured data from LLMs, with the best DX possible. Works with all languages. Check out the promptfiddle.com playground

Rust 1,581 55 Updated Dec 12, 2024

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Jupyter Notebook 213 65 Updated Dec 13, 2024

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services

Python 475 155 Updated Dec 11, 2024

High-performance In-browser LLM Inference Engine

TypeScript 13,986 906 Updated Dec 9, 2024

Efficient, Flexible and Portable Structured Generation

C++ 468 22 Updated Dec 12, 2024

how to optimize some algorithm in cuda.

Cuda 1,723 142 Updated Dec 12, 2024

Puzzles for learning Triton

Jupyter Notebook 1,196 92 Updated Nov 18, 2024

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 35,170 4,341 Updated Nov 18, 2024

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

540 13 Updated Dec 12, 2024

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

Jupyter Notebook 2,960 651 Updated Dec 10, 2024

Distributed stream processing engine in Rust

Rust 3,840 226 Updated Dec 12, 2024

A Vector Database Tutorial (over CMU-DB's BusTub system)

C++ 643 18 Updated Jan 21, 2024

Rust 程序设计语言 中文版——Chinese translation of The Rust Programming Language (Book)

Rust 862 145 Updated Nov 22, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,418 72 Updated Dec 12, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,505 577 Updated Dec 12, 2024

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Python 7,277 510 Updated Sep 18, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,544 153 Updated Dec 13, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,713 372 Updated Jul 11, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 929 82 Updated Dec 10, 2024

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

Python 509 50 Updated Nov 20, 2024

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)

Python 170 22 Updated Dec 13, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,682 215 Updated Dec 13, 2024

深度学习经典、新论文逐段精读

27,473 2,467 Updated Nov 17, 2024

Fast and memory-efficient exact attention

Python 14,615 1,370 Updated Dec 12, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

1,044 85 Updated Nov 23, 2024

LLM101n: Let's build a Storyteller

30,487 1,668 Updated Aug 1, 2024

Ongoing research training transformer models at scale

Python 10,799 2,414 Updated Dec 13, 2024
Next