GitHub - Doubiiu/CodeTalker: [CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

CodeTalker

Official PyTorch implementation for the paper:

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior, CVPR 2023.

Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

We propose CodeTalker by casting speech-driven facial animation as a code query task in a finite proxy space of the learned codebook. Given the raw audio and a 3D neutral face template, our CodeTalker can produce vivid and realistic 3D facial motions with subtle expressions and accurate lip movements.

Changelog

2023.06.16 Provide a Colab online demo.
2023.04.03 Release code and model weights!

Environment

Linux
Python 3.6+
Pytorch 1.9.1
CUDA 11.1 (GPU with at least 11GB VRAM)

Other necessary packages:

pip install -r requirements.txt

ffmpeg
MPI-IS/mesh

IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown here.

Dataset Preparation

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/. Download "FLAME_sample.ply" from voca and put it in vocaset/. Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav:

cd vocaset
python process_voca_data.py

BIWI

Follow the BIWI/README.md to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/.

Demo

Download the pretrained models from biwi_stage1.pth.tar & biwi_stage2.pth.tar and vocaset_stage1.pth.tar & vocaset_stage2.pth.tar. Put the pretrained models under BIWI and VOCASET folders, respectively. Given the audio signal,

to animate a mesh in FLAME topology, run:
```
 sh scripts/demo.sh vocaset
```
to animate a mesh in BIWI topology, run:
```
 sh scripts/demo.sh BIWI
```
This script will automatically generate the rendered videos in the demo/output folder. You can also put your own test audio file (.wav format) under the demo/wav folder and specify the arguments in DEMO section of config/<dataset>/demo.yaml accordingly (e.g., demo_wav_path, condition, subject, etc.).

Training / Testing

The training/testing operation shares a similar command:

sh scripts/<train.sh|test.sh> <exp_name> config/<vocaset|BIWI>/<stage1|stage2>.yaml <vocaset|BIWI> <s1|s2>

Please replace <exp_name> with your own experiment name, <vocaset|BIWI> by the name of your target dataset, i.e., vocaset or BIWI. Change the exp_dir in both scripts/train.sh and scripts/test.sh if needed. We just take an example for default commands below.

Training for Discrete Motion Prior

sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1

Training for Speech-Driven Motion Synthesis

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

Testing

sh scripts/test.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

Visualization with Audio

Modify the paths in scripts/render.sh and run:

sh scripts/render.sh

Evaluation on BIWI

We provide the reference code for Lip Vertex Error & Upper-face Dynamics Deviation. Remember to change the paths in scripts/cal_metric.sh, and run:

sh scripts/cal_metric.sh

Play with Your Own Data

Data Preparation

Create the dataset directory <dataset_dir> in CodeTalker directory.
Place your vertices data (.npy files) and audio data (.wav files) in <dataset_dir>/vertices_npy and <dataset_dir>/wav folders, respectively.
Save the templates of all subjects to a templates.pkl file and put it in <dataset_dir>, as done for BIWI and vocaset dataset. Export an arbitary template to .ply format and put it in <dataset_dir>/.

Training, Testing & Visualization

Create the corresponding config files in config/<dataset_dir> and modify the arguments in the config files.
Check all the code segments releated to dataset information.
Following the training/testing/visualization pipeline as done for BIWI and vocaset dataset.

Citation

If you find the code useful for your work, please star this repo and consider citing:

@inproceedings{xing2023codetalker,
  title={Codetalker: Speech-driven 3d facial animation with discrete motion prior},
  author={Xing, Jinbo and Xia, Menghan and Zhang, Yuechen and Cun, Xiaodong and Wang, Jue and Wong, Tien-Tsin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12780--12790},
  year={2023}
}

Notes

Although our codebase allows for training with multi-GPUs, we did not test it and just hardcode the training batch size as one. You may need to change the data_loader if needed.

Acknowledgement

We heavily borrow the code from FaceFormer, Learn2Listen, and VOCA. Thanks for sharing their code and huggingface-transformers for their wav2vec2 implementation. We also gratefully acknowledge the ETHZ-CVL for providing the B3D(AC)2 dataset and MPI-IS for releasing the VOCASET dataset. Any third-party packages are owned by their respective authors and must be used under their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
BIWI		BIWI
base		base
config		config
dataset		dataset
demo		demo
main		main
metrics		metrics
models		models
scripts		scripts
utils		utils
vocaset		vocaset
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
figure.png		figure.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeTalker

Changelog

Environment

Dataset Preparation

VOCASET

BIWI

Demo

Training / Testing

Training for Discrete Motion Prior

Training for Speech-Driven Motion Synthesis

Testing

Visualization with Audio

Evaluation on BIWI

Play with Your Own Data

Data Preparation

Training, Testing & Visualization

Citation

Notes

Acknowledgement

Related Work

About

Releases

Packages

Languages

License

Doubiiu/CodeTalker

Folders and files

Latest commit

History

Repository files navigation

CodeTalker

Changelog

Environment

Dataset Preparation

VOCASET

BIWI

Demo

Training / Testing

Training for Discrete Motion Prior

Training for Speech-Driven Motion Synthesis

Testing

Visualization with Audio

Evaluation on BIWI

Play with Your Own Data

Data Preparation

Training, Testing & Visualization

Citation

Notes

Acknowledgement

Related Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages