Stars
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
OpenAI compatible TTS for Sesame CSM:1b - Voice Cloning from File/YT
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
SGLang is a fast serving framework for large language models and vision language models.
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
A Conversational Speech Generation Model
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models
TorchCFM: a Conditional Flow Matching library
A cross-platform Markdown note-taking application dedicated to using AI to bridge recording and writing, organizing fragmented knowledge into a readable note.
Toolkit for linearizing PDFs for LLM datasets/training
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Witness the aha moment of VLM with less than $3.
Finetune Llama 4, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
A novel approach to hunyuan image-to-video sampling
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
翻墙-科学上网、自由上网、免费科学上网、免费翻墙、fanqiang、油管youtube/视频下载、软件、VPN、一键翻墙浏览器,vps一键搭建翻墙服务器脚本/教程,免费shadowsocks/ss/ssr/v2ray/goflyway账号/节点,翻墙梯子,电脑、手机、iOS、安卓、windows、Mac、Linux、路由器翻墙、科学上网、youtube视频下载、youtube油管镜像/免翻墙…