Skip to content
View leo19941227's full-sized avatar

Organizations

@s3prl

Block or report leo19941227

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 44 3 Updated Jan 16, 2025

A Survey of Spoken Dialogue Models (60 pages)

249 16 Updated Nov 28, 2024

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 13,535 1,873 Updated Nov 19, 2024

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis

27 Updated Jan 18, 2025

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Python 42 2 Updated Nov 1, 2024

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 94 6 Updated Jan 20, 2025

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 4,807 464 Updated Dec 26, 2024

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

Python 44 2 Updated Jul 2, 2024

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 297 17 Updated Jan 14, 2025

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Python 41 1 Updated Oct 10, 2024

A toolkit to calculate speech audio quality. Not affiliated with the original authors

Python 48 4 Updated Aug 13, 2024

UT-Sarulab MOS prediction system using SSL models

Python 200 14 Updated Apr 11, 2024

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 256 16 Updated Jan 2, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 470 26 Updated Nov 19, 2024

Scalable and Performant Data Loading

Python 208 11 Updated Jan 20, 2025

Flops counter for convolutional networks in pytorch framework

Python 2,849 307 Updated Jan 20, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 115 9 Updated Sep 19, 2024

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Python 897 40 Updated Sep 27, 2024

A fast multimodal LLM for real-time voice

Python 2,994 188 Updated Jan 14, 2025

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Jupyter Notebook 215 17 Updated Jul 11, 2024

Speech Human Evaluation Estimation Toolkit (SHEET)

Python 49 6 Updated Nov 13, 2024

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 111 2 Updated Dec 13, 2024

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 864 56 Updated Oct 28, 2024
Python 23 1 Updated Dec 26, 2024
Python 7,171 563 Updated Jan 14, 2025

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,281 346 Updated Jan 20, 2025

General Speech Restoration

Python 1,067 132 Updated May 31, 2024

A summary of related works about flow matching, stochastic interpolants

373 14 Updated Jul 29, 2024

The official code for the SALMon🍣 benchmark

Python 43 Updated Dec 14, 2024
Next