Pytorch implementation of [Apollo: Zero-Shot Multimodal Reasoning with Multiple Experts](todo put here link arxiv)
$ pip install requirements.txt
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git
The code was tested successfully on Intel Xeon with NVIDIA RTX 2080 Ti and CUDA 11.4.
In order to use Apollo-Cap-PD:
git clone https://github.com/danielabd/CLIP
To execute the following commands, go to zero_shot_style/
.
python run.py --style positive --ce_scale 1.96 --clip_scale 2.19 --text_style_scale 9.68 --sentiment_temperature 0.001 --use_img_path ./example_images/000000276434.jpg
python run.py --style negative --ce_scale 2.855 --clip_scale 5.036 --text_style_scale 11.9 --sentiment_temperature 0.001 --use_img_path ./example_images/000000155617.jpg
python run.py --style humor --ce_scale 0.6604141408776456 --clip_scale 1 --text_style_scale 2.9876837003652907 --sentiment_temperature 0.001 --use_img_path ./example_images/2211593099_4a4f1c85d2.jpg
python run.py --style romantic --ce_scale 0.7097647446401579 --clip_scale 1 --text_style_scale 4.332869432646197 --sentiment_temperature 0.001 --use_img_path ./example_images/1579287915_4257c54451.jpg
python run.py --style positive --mul_clip_style --ce_scale 4 --clip_scale 8 --text_style_scale 0 --sentiment_temperature 0.01 --use_img_path ./example_images/000000274455.jpg
python run.py --style negative --mul_clip_style --ce_scale 0.6209475551271303 --clip_scale 2 --text_style_scale 0 --sentiment_temperature 0.08555964306820746 --use_img_path ./example_images/000000217303.jpg
python run.py --style humor --mul_clip_style --ce_scale 0.4173438996507689 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05089738868653932 --use_img_path ./example_images/311267421_e204e643cf.jpg
python run.py --style romantic --mul_clip_style --ce_scale 0.5 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05364761206623257 --use_img_path ./example_images/3457315666_b943111dec.jpg
python run.py --style positive --update_ViT --mul_clip_style --ce_scale 0.2214432225421577 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.1430339855494212 --num_iterations_clip_style 1 --use_img_path ./example_images/000000276434.jpg
python run.py --style negative --update_ViT --mul_clip_style --ce_scale 0.6070550610590508 --clip_scale 2 --text_style_scale 0 --sentiment_temperature 0.17425402664880124 --num_iterations_clip_style 1 --use_img_path ./example_images/000000077954.jpg
python run.py --style humor --update_ViT --mul_clip_style --ce_scale 0.3426740175716766 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05655316717625009 --num_iterations_clip_style 1 --use_img_path ./example_images/940973925_a2e6d7951c.jpg
python run.py --style romantic --update_ViT --mul_clip_style --ce_scale 0.3735 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.0672676323972359 --num_iterations_clip_style 1 --use_img_path ./example_images/1489286545_8df476fa26.jpg
python run.py --mul_clip_style --ce_scale 4 --clip_scale 8 --text_style_scale 0 --sentiment_temperature 0.01 --use_audio_model --audio_path ./child_laughing.wav --audio_sampling_rate 24000 --use_img_path ./example_images/000000155617.jpg
If you use our work for your research, please cite us. <!--- TODO: put bib tex --->