Skip to content
View akihironitta's full-sized avatar

Sponsoring

@asottile

Organizations

@kumo-ai @pyg-team

Block or report akihironitta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 333 24 Updated Dec 20, 2025

a language for fast, portable data-parallel computation

C++ 6,483 1,096 Updated Dec 24, 2025

MoE training for Me and You and maybe other people

Python 293 25 Updated Dec 17, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,666 86 Updated Dec 20, 2025

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 48 13 Updated Dec 23, 2025

MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and …

Python 15 9 Updated Dec 25, 2025

The Triton backend for the PyTorch TorchScript models.

C++ 168 64 Updated Dec 22, 2025

A debugging and profiling tool that can trace and visualize python code execution

Python 7,464 467 Updated Dec 24, 2025

torchcomms: a modern PyTorch communications API

C++ 313 49 Updated Dec 24, 2025

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 403 67 Updated Dec 25, 2025

Minimal yet performant LLM examples in pure JAX

Python 222 28 Updated Dec 4, 2025

A lightweight LLVM python binding for writing JIT compilers

Python 2,189 350 Updated Dec 19, 2025

Github mirror of trition-lang/triton repo.

MLIR 110 31 Updated Dec 25, 2025

Tokamax: A GPU and TPU kernel library.

Python 142 6 Updated Dec 23, 2025

🔥 A minimal training framework for scaling FLA models

Python 324 49 Updated Nov 15, 2025

High-performance Python runtime extensions

Python 53 12 Updated Dec 23, 2025

Cinder is Meta's internal performance-oriented production version of CPython.

Python 3,748 134 Updated Dec 19, 2025

1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu

Jupyter Notebook 29 2 Updated Sep 8, 2025

A Quirky Assortment of CuTe Kernels

Python 719 64 Updated Dec 23, 2025

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

Python 1,506 197 Updated Dec 12, 2025

A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.

Python 105 6 Updated Sep 26, 2025

Efficient in-memory representation for ONNX, in Python

Python 37 17 Updated Dec 24, 2025

🔬 MCP server to query KumoRFM in your agentic flows

Python 26 3 Updated Dec 2, 2025
Jupyter Notebook 3 1 Updated Sep 10, 2025

Application Kernel for Containers

Go 17,399 1,476 Updated Dec 24, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 695 89 Updated Dec 24, 2025

Submission of College Matcher for kumo hackathon. If website is down please make an issue!

Python 2 Updated Aug 18, 2025
Next