need more time and computing resource for construction
This is a incomplete pytorch version WaveSplit implemention Wavesplit: End-to-End Speech Separation by Speaker Clustering
The whole model's param is ~65M, and the original version can not train one sample even in P40 ... So I added a encoder (conv1d layer) and a decoder to reduce the feature's length ... Moreover, the mapping model was replaced by masking ... I had trained a model on wsj0-2mix, but the training was too slowly and 13 epoch model's sdr is ~10 dB ... Finally, I gave up it ... Maybe I will continue construction in future ...
If you have any questions or advices, please issue or mail [email protected]
- main model
- Speaker loss
- kmeans
- Gauss Layer but not used now
- Dropout Layer but not used now
- Mixup Layer
- Dynamic mixing