Hai Carroll wanghua-lei

🏠

Working from home

0 followers · 12 following

Highlights

Starred repositories

FoundationVision / Infinity

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 733 20 Updated Jan 6, 2025

shuaijiang / Whisper-Finetune

Forked from yeyupiaoling/Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 220 14 Updated Dec 16, 2024

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,676 305 Updated Oct 28, 2024

microsoft / UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Python 444 74 Updated Apr 5, 2024

mlfoundations / open_flamingo

An open-source framework for training large multimodal models.

Python 3,789 289 Updated Aug 31, 2024

kyegomez / PALI

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Python 88 8 Updated Mar 20, 2024

amazon-science / QA-ViT

Python 59 7 Updated Jul 17, 2024

WhisperSpeech / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 4,062 223 Updated Dec 12, 2024

HumBug-Mosquito / HumBugDB

Acoustic mosquito detection code with Bayesian Neural Networks

Jupyter Notebook 50 16 Updated Oct 4, 2021

seancampos / ComParE2022_VecNet

Jupyter Notebook 2 Updated Dec 16, 2022

MontrealCorpusTools / MFA-reorganization-scripts

Collection of scripts and utilities for reorganizing corpora to use with the Montreal Forced Aligner

Python 44 6 Updated Jun 22, 2021

MorenoLaQuatra / ComParE2022_MED

This repository contains the code to setup the experiments for the ComParE 2022 mosquito event detection sub-challenge.

Python 5 3 Updated Oct 25, 2022

manashpratim / Frame-Level-Classification-of-Speech

Jupyter Notebook 1 Updated May 28, 2020

Audio-WestlakeU / audiossl

A library built for easier audio self-supervised training, downstream tasks evaluation

Python 110 10 Updated Aug 27, 2024

Audio-WestlakeU / ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 108 13 Updated Oct 15, 2024

facebookresearch / libri-light

dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.

Python 484 78 Updated Jul 11, 2023

google-deepmind / librispeech-long

LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

43 1 Updated Dec 28, 2024