momoGBG

momoGBG

1 follower · 1 following

Stars

antgroup / ditto-talkinghead

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

Python 235 38 Updated Jan 21, 2025

phildougherty / sesame_csm_openai

OpenAI compatible TTS for Sesame CSM:1b - Voice Cloning from File/YT

Python 278 46 Updated Mar 25, 2025

kyutai-labs / moshi-finetune

Python 194 9 Updated Apr 3, 2025

HumanMLLM / R1-Omni

Python 822 48 Updated Mar 24, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,937 700 Updated Apr 12, 2025

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 5,573 541 Updated Mar 24, 2025

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

TypeScript 49,029 4,629 Updated Apr 15, 2025

canopyai / Orpheus-TTS

Towards Human-Sounding Speech

Python 4,065 327 Updated Apr 15, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 13,231 1,518 Updated Apr 16, 2025

Lightricks / LTX-Video

Official repository for LTX-Video

Python 3,316 292 Updated Mar 5, 2025

shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 970 125 Updated Mar 31, 2025

ali-vilab / ACE_plus

Python 968 59 Updated Mar 22, 2025

SesameAILabs / csm

A Conversational Speech Generation Model

Python 12,522 1,127 Updated Mar 27, 2025

sony / silentcipher

Python 48 25 Updated Jul 22, 2024

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,587 202 Updated Apr 15, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 249 29 Updated Mar 12, 2025

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 8,341 745 Updated Apr 16, 2025

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 9,976 1,088 Updated Apr 2, 2025

atong01 / conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Python 1,638 133 Updated Mar 11, 2025

codexu / note-gen

A cross-platform Markdown note-taking application dedicated to using AI to bridge recording and writing, organizing fragmented knowledge into a readable note.

TypeScript 1,347 115 Updated Apr 16, 2025

allenai / olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Python 11,124 754 Updated Apr 15, 2025

baichuan-inc / Baichuan-Audio

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Python 176 12 Updated Feb 28, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,530 275 Updated Mar 1, 2025

unslothai / unsloth

Finetune Llama 4, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 37,121 2,893 Updated Apr 14, 2025

AeroScripts / leapfusion-hunyuan-image2video

A novel approach to hunyuan image-to-video sampling

Python 296 16 Updated Feb 5, 2025

stepfun-ai / Step-Audio

Python 4,169 338 Updated Mar 12, 2025

astramind-ai / Auralis

A Fast TTS Engine

Python 487 37 Updated Jan 23, 2025

JusperLee / TIGER

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Python 226 39 Updated Apr 8, 2025

Alvin9999 / new-pac

翻墙-科学上网、自由上网、免费科学上网、免费翻墙、fanqiang、油管youtube/视频下载、软件、VPN、一键翻墙浏览器，vps一键搭建翻墙服务器脚本/教程，免费shadowsocks/ss/ssr/v2ray/goflyway账号/节点，翻墙梯子，电脑、手机、iOS、安卓、windows、Mac、Linux、路由器翻墙、科学上网、youtube视频下载、youtube油管镜像/免翻墙…

Python 60,468 9,889 Updated Apr 16, 2025

kohya-ss / musubi-tuner

Python 554 54 Updated Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly