Stars
Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GPT4o (closed) or Moshi (complex), it's open, simple, natural.
A PyTorch implementation of a Bigram Language Model using Transformer architecture for character-level text generation.
Trains small LMs. Designed for training on SimpleStories
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Code for 'Emergent Analogical Reasoning in Large Language Models'
Tools for working with the Abstraction & Reasoning Corpus
Self-contained, minimalistic implementation of diffusion models with Pytorch.
A proof of concept for Calendar integration within a Svelte/MeteorJS app
Sparsify transformers with SAEs and transcoders
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
PyTorch implementation of TimesFM model.
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
A small language model in PyTorch implementing the Tiny Stories paper.
Large Concept Models: Language modeling in a sentence representation space
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
