Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"
https://arxiv.org/abs/2110.03156
Ubuntu 18.04.5 LTS
- GPU: Quadro RTX 6000
- Driver version: 450.80.02
- CUDA version: 11.0
Python 3.5
- tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
- scipy
- pandas
- matplotlib
- librosa
For example,
conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0
-
Run
python utils.py
to extract .wav to .h5; -
Run
python train.py
to train a CNN-BLSTM based StrengthNet;
-
Put the waveforms you wish to evaluate in a folder. For example,
<path>/<to>/<samples>
-
Run
python test.py --rootdir <path>/<to>/<samples>
This script will evaluate all the .wav
files in <path>/<to>/<samples>
, and write the results to <path>/<to>/<samples>/StrengthNet_result_raw.txt
.
By default, the output/strengthnet.h5
pretrained model is used.
If you find this work useful in your research, please consider citing:
@misc{liu2021strengthnet,
title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis},
author={Rui Liu and Berrak Sisman and Haizhou Li},
year={2021},
eprint={2110.03156},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
The ESD corpus is released by the HLT lab, NUS, Singapore.
The strength scores for the English samples of the ESD corpus are available here.
MOSNet: https://github.com/lochenchou/MOSNet
Relative Attributes: Relative Attributes
This work is released under MIT License (see LICENSE file for details).