- Mountain View, California
-
17:23
(UTC -08:00) - https://www.akihironitta.com
Highlights
Lists (7)
Sort Name ascending (A-Z)
Stars
- All languages
- AGS Script
- AutoHotkey
- C
- C#
- C++
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Emacs Lisp
- FreeMarker
- Go
- HTML
- Hack
- Java
- JavaScript
- Jsonnet
- Jupyter Notebook
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Max
- Metal
- Mojo
- OCaml
- Objective-C
- Objective-C++
- PHP
- Perl
- PowerShell
- Pug
- Python
- R
- Roff
- Ruby
- Rust
- Scala
- Scheme
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- Vue
- YASnippet
- Zig
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
a language for fast, portable data-parallel computation
MoE training for Me and You and maybe other people
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and …
The Triton backend for the PyTorch TorchScript models.
A debugging and profiling tool that can trace and visualize python code execution
torchcomms: a modern PyTorch communications API
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
Minimal yet performant LLM examples in pure JAX
A lightweight LLVM python binding for writing JIT compilers
Github mirror of trition-lang/triton repo.
🔥 A minimal training framework for scaling FLA models
High-performance Python runtime extensions
Cinder is Meta's internal performance-oriented production version of CPython.
1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
Efficient in-memory representation for ONNX, in Python
🔬 MCP server to query KumoRFM in your agentic flows
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Submission of College Matcher for kumo hackathon. If website is down please make an issue!






