Stars
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Run ComfyUI workflows on multiple local GPUs/networked machines.
A zero dependency web UI for any LLM backend, including KoboldCpp, OpenAI and AI Horde
SLOP Detector and analyzer based on dictionary for shareGPT JSON and text
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
The world's simplest facial recognition api for Python and the command line
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Fast inference engine for Transformer models
[SIGGRAPH ASIA 2024 TCS] AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Everything-Reactivity in ComfyUI (audio, MIDI, motion, proximity, and more).