Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset paper
./HDTF_dataset consists of youtube video url, time stamps of talking face and facial region in the video. xx_video_url.txt:
format: video name | video youtube url
xx_annotion_time.txt:
format: video name | time stamps of clip1 | time stamps of clip2 | time stamps of clip3....
xx_crop_wh.txt:
format: video name+clip index | min_width | width | min_height | height
If you use HDTF dataset, pls
-
Download videos from xx_video_url.txt with you-get tool or youtube-dl tool. (pls download the highest definition version: 1080P or 720P). Transform video into .mp4 format. You'd better transform interlaced video to progressive video as well.
-
Split long original video into appropriate talking head clips with time stamps in xx_annotion_time.txt. Name the splitted clip as video name_clip index.mp4. For example, split the video Radio11.mp4 00:30-01:00 01:30-02:30 into Radio11_0.mp4 and Radio11_1.mp4 .
-
Crop the facial region with fixed window size in xx_crop_wh.txt and resize the video into 512 x 512 resolution.
if you use HDTF, pls reference
@inproceedings{zhang2021flow,
title={Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset},
author={Zhang, Zhimeng and Li, Lincheng and Ding, Yu and Fan, Changjie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3661--3670},
year={2021}
}