Y-vector: Multiscale Waveform Encoder for Speaker Embedding

Introduction

In this paper, we use the modular architecture on raw waveform speaker embedding, to be specific: a waveform encoder and deep embedding backbone.

Official inference code for Y-vector (https://arxiv.org/abs/2010.12951) and unofficial code for wav2spk (https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1287.pdf)

In our experiment, we train on VoxCeleb2 Dev dataset, and test on VoxCeleb1 dataset.

Provided pretrained model (link) results (EER, minDCF(0.01)):

Metric	VoxCeleb1-O	VoxCeleb1-E	VoxCeleb1-H
EER	2.35	2.32	3.89
minDCF(0.01)	0.242	0.235	0.349

Requirements

python==3.6
pytorch==1.7.1
numba==0.48 # install before librosa
librosa==0.6.3
tqdm
pandas

Usage

inference.py is for reproducing results on the table, speaker embedding extraction is simply load wav and run forward model with pretrained models.

Guidance on reproducing the results: after installing required packages, download VoxCeleb1 data first then use extract_wav.py to save input feature into pickle files. Saved pkl files format: spkid-recid-fileid.pkl Then run inference.py to compute embeddings and test EER and minDCF.

To inference embeddings for other datasets, the minimum length of input utterance should be longer than 4 seconds.

Results

Cosine similarity score results on VoxCeleb-1 Test dataset (EER):

System	VoxCeleb1-O	VoxCeleb1-E	VoxCeleb1-H
wav2spk	3.00	2.78	4.56
Y-vector.	2.72	2.38	3.87

(Notice that VoxCeleb1-O can fluctuate a lot in our experimental setting)

It's possible to boost the performance by replacing each part with stronger networks. For example, replace backbone with F-TDNN, E-TDNN or ECAPA-TDNN.

Name	Name	Last commit message	Last commit date
Latest commit gzhu06 Jul 16, 2024 1c4708e · Jul 16, 2024 History 21 Commits
pretrained	pretrained	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021
test_list	test_list	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021
LICENSE	LICENSE	Create LICENSE for codebase	Jul 16, 2024
README.md	README.md	Update README	Sep 1, 2022
eer_monitor.py	eer_monitor.py	remove unnecessary code	Sep 30, 2021
extract_wav.py	extract_wav.py	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021
inference.py	inference.py	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021
tdnn.py	tdnn.py	update README and yvector	Jun 9, 2021
utils.py	utils.py	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021
voxceleb_license.txt	voxceleb_license.txt	add voxceleb license file	Jul 16, 2024
wav2spk.py	wav2spk.py	add pytorch model code	Jun 9, 2021
yvector.py	yvector.py	massive update, now the code is run-able, results verified on GPU, wi…	Jul 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Y-vector: Multiscale Waveform Encoder for Speaker Embedding

Introduction

Requirements

Usage

Results

About

Releases

Packages

Languages

License

gzhu06/Y-vector

Folders and files

Latest commit

History

Repository files navigation

Y-vector: Multiscale Waveform Encoder for Speaker Embedding

Introduction

Requirements

Usage

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages