Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions [Arxiv Paper]
"Make him look like Vincent Van Gogh"
"He should be in "Zelda: Breath of the Wild"""He should look 100 years old"
Installation
git clone https://github.com/timothybrooks/instruct-pix2pix
cd instruct-pix2pix
conda env create -f environment.yaml
Usage
conda activate ip2p
bash scripts/download_checkpoints.sh
python edit_cli.py --steps 100 --resolution 512 --seed 1371 --cfg-text 4.5 --cfg-image 1.2 --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg"
For Example-based Image Synthesis, we recommend the Ebsynth.exe for windows. select keyframes->select video->run all.
For our task, the keyframes is the edited portrait image, and the video images are the training or rendering images.
Installation
git clone --recursive https://github.com/Zielon/INSTA.git
cd INSTA
cmake . -B build
cmake --build build --config RelWithDebInfo -j
cd INSTANT
After building the project you can either start training an avatar from scratch or load a snapshot. For training, we recommend a graphics card higher or equal to RTX3090 24GB
and 32 GB
of RAM memory. Training on a different hardware probably requires adjusting options in the config:
"parent": "main.json",
"max_steps": 30000,
"max_cached_bvh": 4000,
"max_images_gpu": 1700,
"use_dataset_cache": true,
"render_novel_trajectory": false,
"render_from_snapshot": true
Usage
cd INSTANT
## Training
./build/rta --config insta.json --scene data/obama --height 512 --width 512 --no-gui
## Loading from a checkpoint
./build/rta --config insta.json --scene data/obama --height 512 --width 512 --no-gui --snapshot data/obama/experiments/insta/debug/snapshot.msgpack
For training, set "render_from_snapshot": false
. For rendering from a checkpoint, set "render_from_snapshot": true
. For rendering novel views, set "render_novel_trajectory": true
.
-
Select one keyframe and execute image editing with InstructPix2Pix.
-
Update the images of dataset with ebsynth, with original/rendering images and the edited keyframe.
-
Train avatar and render portrait images, continue to step 2.
Tips: The open mouth keyframe is better. Three iterations of step 2&3 is sufficient.
We are releasing part of our training dataset and the checkpoint. We use the avatars from INSTA.
Some checkpoints Edited Avatars.
For Dataset Generation of original videos, we direct the user to INSTA and Metrical Photometric Tracker
@misc{li2023instructvideo2avatar,
title={Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions},
author={Shaoxu Li},
year={2023},
eprint={2306.02903},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
InstructPix2Pix: Learning to Follow Image Editing Instructions. (https://github.com/timothybrooks/instruct-pix2pix)
ebsynth: Fast Example-based Image Synthesis and Style Transfer. (https://github.com/jamriska/ebsynth)
INSTA - Instant Volumetric Head Avatars. (https://github.com/Zielon/INSTA/tree/master)