leo19941227

Shu-wen Yang leo19941227

Speech and Audio Foundation Models

112 followers · 89 following

National Taiwan University
Taipei, Taiwan
09:20 (UTC +08:00)
https://leo19941227.github.io
@leo19941227

Achievements

Organizations

Lists (3)

Sort

✨ Inspiration

1 repository

save

Tools

1 repository

Stars

kehanlu / DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 44 3 Updated Jan 16, 2025

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

249 16 Updated Nov 28, 2024

neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 13,535 1,873 Updated Nov 19, 2024

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis

27 Updated Jan 18, 2025

youngsheen / GPST

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Python 42 2 Updated Nov 1, 2024

LqNoob / Neural-Codec-and-Speech-Language-Models

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 94 6 Updated Jan 20, 2025

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 4,807 464 Updated Dec 26, 2024

ShovalMessica / NAST

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

Python 44 2 Updated Jul 2, 2024

Stability-AI / stable-codec

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 297 17 Updated Jan 14, 2025

nii-yamagishilab / mos-finetune-ssl

Python 88 21 Updated Jun 14, 2023

AlanBaade / SyllableLM

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Python 41 1 Updated Oct 10, 2024

fakerybakery / utmos

A toolkit to calculate speech audio quality. Not affiliated with the original authors

Python 48 4 Updated Aug 13, 2024

sarulab-speech / UTMOS22

UT-Sarulab MOS prediction system using SSL models

Python 200 14 Updated Apr 11, 2024

VITA-MLLM / Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 256 16 Updated Jan 2, 2025

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 470 26 Updated Nov 19, 2024

facebookresearch / spdl

Scalable and Performant Data Loading

Python 208 11 Updated Jan 20, 2025

sovrasov / flops-counter.pytorch

Flops counter for convolutional networks in pytorch framework

Python 2,849 307 Updated Jan 20, 2025

Aria-K-Alethia / BigCodec

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 115 9 Updated Sep 19, 2024

LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Python 897 40 Updated Sep 27, 2024

fixie-ai / ultravox

A fast multimodal LLM for real-time voice

Python 2,994 188 Updated Jan 14, 2025

bytedance / MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Jupyter Notebook 215 17 Updated Jul 11, 2024

unilight / sheet

Speech Human Evaluation Estimation Toolkit (SHEET)

Python 49 6 Updated Nov 13, 2024

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 111 2 Updated Dec 13, 2024

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 864 56 Updated Oct 28, 2024

SonyCSLParis / audio-metrics

Python 23 1 Updated Dec 26, 2024

kyutai-labs / moshi

Python 7,171 563 Updated Jan 14, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,281 346 Updated Jan 20, 2025

haoheliu / voicefixer

General Speech Restoration

Python 1,067 132 Updated May 31, 2024

dongzhuoyao / awesome-flow-matching

A summary of related works about flow matching, stochastic interpolants

373 14 Updated Jul 29, 2024

slp-rl / salmon

The official code for the SALMon🍣 benchmark

Python 43 Updated Dec 14, 2024