X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion
This project relates to the implementation of X-TF-GridNet, a Target Speaker Extraction Network (TSE) in the time-frequency (T-F) domain, which has been accepted by Information Fusion. Our proposed method boasts two key extensions: a U2-Net style network adeptly extracts robust fixed speaker embeddings, and an adaptive embedding fusion (AEA) mechanism ensures the effective utilization of target speaker information.
In this project, the primary basis is the original implementation of SpEx+ and the implementation of TF-GridNet. Notably, the project only encompasses the traing and inference phase. For specifics on data preparation, please refer to there.
We release the model trained on the WHAMR! dataset, there.
# Train the X-TF-GridNet model.
bash train.sh
# Decode the X-TF-GridNet model.
bash decode.sh
# Output score metrics.
bash evalute.sh
We choose the PESQ, SDR and SI-SDR results on the WSJ0-2mix dataset for further comparison with other time domain TSE method.
Method | Domain | Param. (M) | MACs (G/s) | PESQ |
SDR (dB) |
SI-SDR (dB) |
---|---|---|---|---|---|---|
Mixture | - | - | - | - | 2.02 | 0.2 |
SpEx | T | 10.79 | 3.55 | - | 16.3 | 15.8 |
SpEx+ | T | 11.14 | 3.76 | 3.43 | 17.2 | 16.9 |
X-DPRNN | T | 6.32 | 63.92 | - | - | 17.4 |
SpEx++ | T | 34.08 | 11.88 | 3.53 | 18.4 | 18.0 |
SpExpc | T | 28.40 | 40.54 | - | 18.8 | 18.6 |
VEVEN | T | 2.63 | 85.11 | 3.66 | 19.2 | 19.0 |
X-SepFormer | T | 26.66 | 61.34 | 3.74 | 19.5 | 18.9 |
X-TF-GridNet | T-F | 7.79 | 68.32 | 3.70 | 20.4 | 19.7 |
X-TF-GridNet (Large) | T-F | 12.68 | 113.24 | 3.77 | 21.7 | 20.7 |
(* More details can be found in the paper.)
If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.
@article{hao2024if,
title = {{X-TF-GridNet}: A time–frequency domain target speaker extraction network with adaptive speaker embedding fusion},
journal = {Information Fusion},
volume = {112},
pages = {102550},
year = {2024},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102550},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524003282},
author = {Fengyuan Hao and Xiaodong Li and Chengshi Zheng},
}