-
Inner Mongolia University, China
- Hohhot
- ttslr.github.io
- @RuiLiu60711141
Starred repositories
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction" https://arxiv.org/abs/1906.05317
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Crowdsourced and Automatic Speech Prominence Estimation
Facestar dataset. High quality audio-visual recordings of human conversational speech.
Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
MARS5 speech model (TTS) from CAMB.AI
Use OpenAI's realtime API for a chatting with your documents
A simple, high-quality voice conversion tool focused on ease of use and performance.
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
[ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
ChatReviewer: 使用ChatGPT分析论文优缺点,提出改进建议
Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))
A comprehensive overview of affective computing research in the era of large language models (LLMs).
SpeechGPT Series: Speech Large Language Models
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
The source code for the paper titled "Sentiment Knowledge Enhanced Attention Fusion Network (SKEAFN)".
[ICASSP2024] Code for paper "SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection"
SiLLM is a Simultaneous Machine Translation (SiMT) Framework. It utilizes a Large Language model as the translation model and employs a traditional SiMT model for policy-decision to achieve SiMT th…
tarepan / vocos-official
Forked from gemelo-ai/vocosVocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset
Official release of InternLM2.5 base and chat models. 1M context support
Source code for the paper 'Audio Captioning Transformer'