Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".
I‘ve only tried on python 3.6
and pytorch 1.7.0
.
python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame
free-view (e.g. yaw=20, pitch=roll=0):
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --row 0
Note: run crop-video.py --inp driving_video.mp4
to get cropping suggestions and crop the driving video.
Model | Train Set | Baidu Netdisk | Google Drive |
---|---|---|---|
Vox-256-Beta | VoxCeleb-v1 | Baidu (PW: c0tc) | soon |
Vox-256-Stable | VoxCeleb-v1 | soon | soon |
Vox-256 | VoxCeleb-v2 | soon | soon |
Vox-512 | VoxCeleb-v2 | soon | soon |
Note:
- At present, the training of the Beta Version is not sufficient, the clarity of the result is poor, and the mouth shape and eyes are not very accurate.
- It is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively for free-view synthesis.
Thanks to NV, AliaksandrSiarohin and DeepHeadPose.