-
Shanghai Jiao Tong University
- Shanghai
-
18:45
(UTC +08:00) - hsiangyuzhao.github.io
Highlights
- Pro
Stars
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
💻 🤖 A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech 🔈
AudioLDM: Generate speech, sound effects, music and beyond, with text.
TigerBot: A multi-language multi-task LLM
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Utilities intended for use with Llama models.
The official repository for "One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts"
[ICLR 2024] FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling
A collection of datasets for the purpose of emotion recognition/detection in speech.
A collection of dataset consists of a total of 8 English speech datasets for SER
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Predicting depression from acoustic features of speech using a Convolutional Neural Network.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Permanent Apple Intelligence + Xcode Predictive Code Completion for Chinese-market Mac computers
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
✨✨Latest Advances on Multimodal Large Language Models
Awesome speech/audio LLMs, representation learning, and codec models