A PyTorch implementation of DNN-based source separation.
Method | Reference | Done |
---|---|---|
Pemutation invariant training (PIT) | Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks | ✔ |
One-and-rest PIT | Recursive Speech Separation for Unknown Number of Speakers | ✔ |
LibriSpeech example using Conv-TasNet
You can check other tutorials in <REPOSITORY_ROOT>/egs/tutorials/
.
cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare_librispeech.sh --dataset_root <DATASET_DIR> --n_sources <#SPEAKERS>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>
If you want to resume training,
. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh
- v0.0.0: Initial version. LibriSpeech Conv-TasNet & DPRNN-TasNet examples are included.
- v0.0.1: Dataset is renamed.
- v0.1.0: Dataset structure is changed.
- v0.1.1: DANet is included.
- v0.1.2: Layer name is changed. Input feature for DANet is replaced by log-magnitude.
- v0.1.3: Add scripts for Wall Street Journal 0 (WSJ0) dataset.
- v0.1.4: Add non-nagative matrix factorization (NMF).
- v0.2.0: Change the representation of short time Fourier transform (STFT).
- v0.2.1:
conv_tasnet
directory is renamed toconv-tasnet
. Add one-and-rest PIT (ORPIT). - v0.3.0:
wsj0
is renamed towsj0-mix
. The result is updated. - v0.3.1: Implement Linear encoder for TasNet.
- v0.3.2: Change the definition of
hidden_channels
in dual-path RNN. - v0.3.3: Fix trained models due to the update v0.3.2.
- v0.4.0: Fix the network architecture of DPRNN-TasNet.
- v0.4.1: Add DPTNet and GALRNet. Re-fix DPRNN-TasNet.
- v0.4.2: Add training script for GALRNet.
- v0.4.3: Re-fix DPRNN-TasNet.
- v0.5.0: Add
parse_options.sh
. - v0.5.1: Multichannel support.
- v0.5.2: Add metric learning tutorials.
- v0.5.3: Update network architecture of D3Net.