Stars
jingcheng-WU / DiffRhythm
Forked from ASLP-lab/DiffRhythmDi♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
This is the official implementation of the LiSenNet
OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Fast algorithm for determined blind source separation with update of demixing filters with joint adjustment of the remaining sources.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
[Interspeech 2024] Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
Paderwasn is a collection of methods for acoustic signal processing in wireless acoustic sensor networks (WASNs).
A lightweight library for portable low-level GPU computation using WebGPU.
Reference implementation for DPO (Direct Preference Optimization)
Generate synthetic wind noise signals based on a wind speed profile.
Stable Diffusion web UI
Synthesizes a room impulse response using a ray tracing simulation engine.
Graph Neural Networks for Sound Source Localization
Pitch detection and pitch tracking, voicing unvoicing detection (VAD),基音检测
A python algorithm to change the pitch of the voice in real time
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
An optimized neural network operator library for chips base on Xuantie CPU.
" Music Style Transfer with Time-Varying Inversion of Diffusion Models"
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。