Skip to content

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Notifications You must be signed in to change notification settings

zhangyongmao/VISinger2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
Jan 10, 2023
Nov 4, 2024
Jan 10, 2023
Jan 10, 2023
Jan 10, 2023
Jan 10, 2023
Apr 21, 2023
Jan 10, 2023

Repository files navigation

VISinger2

This repository is the official PyTorch implementation of VISinger2.

Updates

  • Apr 10 2023: Add egs/visinger2_flow: add flow to VISinger2 to get a more flexible prior distribution.
  • Jan 31 2023: Modify the extraction method of gt-dur in dataset.py. Replace the dsp-wav with a sinusoidal signal as input to the HiFi-GAN decoder.
  • Jan 10 2023: Init commit.

Pre-requisites

  1. Install python requirements: pip install -r requirements.txt
  2. Download the Opencpop Dataset.
  3. prepare data like data/opencpop (wavs, trainset.txt, testset.txt, train.list, test.list)
  4. modify the egs/visinger2/config.json (data/data_dir, train/save_dir)

extract pitch and mel

cd egs/visinger2
bash bash/preprocess.sh config.json

Training

cd egs/visinger2
bash bash/train.sh 0

We trained the model for 500k steps with batch size of 16.

Inference

modify the model_dir, input_dir, output_dir in inference.sh

cd egs/visinger2
bash bash/inference.sh

Some audio samples can be found in demo website and bilibili.

The pre-trained model trained using opencpop is here, the config.json is here, and the result of the test set synthesized by this pre-trained model is here.

Acknowledgements

We referred to VITS, HiFiGAN, gst-tacotron and ddsp_pytorch to implement this. Thanks to swagger-coder for help building visinger2_flow.

About

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published