Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 2 KB

File metadata and controls

45 lines (36 loc) · 2 KB

One-Shot Free-View Neural Talking Head Synthesis

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".

I‘ve only tried on python 3.6 and pytorch 1.7.0.

Driving | FOMM | Ours:
show

Free-View:
show

Train:

python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7

Demo:

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame

free-view (e.g. yaw=20, pitch=roll=0):

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --row 0

Note: run crop-video.py --inp driving_video.mp4 to get cropping suggestions and crop the driving video.

Pretrained Model:

Model Train Set Baidu Netdisk Google Drive
Vox-256-Beta VoxCeleb-v1 Baidu (PW: c0tc) soon
Vox-256-Stable VoxCeleb-v1 soon soon
Vox-256 VoxCeleb-v2 soon soon
Vox-512 VoxCeleb-v2 soon soon

Note:

  1. At present, the training of the Beta Version is not sufficient, the clarity of the result is poor, and the mouth shape and eyes are not very accurate.
  2. It is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively for free-view synthesis.

Acknowlegement:

Thanks to NV, AliaksandrSiarohin and DeepHeadPose.