Skip to content

Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders

Notifications You must be signed in to change notification settings

aayushbansal/Audiovisual-Synthesis

This branch is 7 commits behind dunbar12138/Audiovisual-Synthesis:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jan 13, 2020
4166dcb · Jan 13, 2020

History

14 Commits
Dec 29, 2019
Dec 29, 2019
Jan 12, 2020
Jan 12, 2020
Jan 12, 2020
Dec 27, 2019
Jan 12, 2020
Jan 13, 2020
Dec 29, 2019
Dec 27, 2019
Dec 29, 2019
Dec 27, 2019
Jan 12, 2020
Jan 12, 2020
Jan 12, 2020
Jan 12, 2020

Repository files navigation

Audiovisual-Synthesis

Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders

Kangle Deng, Aayush Bansal, Deva Ramanan

project page / demo / arXiv

This repo provides a PyTorch Implementation of our work.

Acknowledgements: This code borrows heavily from Auto-VC and Tacotron.

Summary Video

Dependencies

First, make sure ffmpeg installed on your machine.

Then, run: pip install -r requirements.txt

Data

We provide our CelebAudio Dataset at link.

Train

Voice Conversion

Check 'scripts/train_audio.sh' for an example of training a Voice-Conversion model. Make sure directory 'logs' exist.

Generally, run:

python train_audio.py --data_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path_A PATH_TO_TEST_AUDIO --test_path_B PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL

Audiovisual Synthesis

Check 'scripts/train_audiovisual.sh' for an example of training a Audiovisual-Synthesis model. We usually train an audiovisual model based on a pretrained audio model.

1-stage generation -- video resolution: 256 * 256

Generally, run:

python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --use_256 --load_model LOAD_MODEL_PATH

2-stage generation -- video resolution: 512 * 512

If you want the video resolution to be 512 * 512, use the StackGAN-style 2-stage generation.

Generally, run:

python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --residual --load_model LOAD_MODEL_PATH

Test

Voice Conversion

Check 'scripts/test_audio.sh' for an example of testing a Voice-Conversion model.

To convert a wavfile using a trained model, run:

python test_audio.py --model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT

Audiovisual Synthesis

Check 'scripts/test_audiovisual.sh' for an example of testing a Audiovisual-Synthesis model.

1-stage generation -- video resolution: 256 * 256

python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --use_256 

2-stage generation -- video resolution: 512 * 512

python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --residual

About

Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%