LOVA3: Learning to Visual Question Answering, Asking and Assessment

TL;DR LOVA3 is a new training paradigm for advancing multimodal training by incorporating new capabilities: asking questions and assessing vqa triplets.

📢 Update

[03/03/2025] We update four models in paper for testing, have fun!
[10/16/2024] We release the webpage.
[09/26/2024] LOVA3 is accepted by NeurIPS 2024.
[07/01/2024] Related work Genixer is accepted by ECCV 2024.
[05/24/2024] We release the code of LOVA3, the EvalQABench, the training dataset Mixed_VQA_GenQA_EvalQA_1.5M.jsonl, and the checkpoint LOVA3-llava-v1.5-7b.
[05/23/2024] We release the LOVA3 paper.

⚒️ Install

conda create -n LOVA python=3.10
conda activate LOVA
pip install --upgrade pip
pip install -e .

Model weight

Model Name	Size	Checkpoint	EvalQA Data Filtered By
LOVA3-llava-v1.5-7b	7B	checkpoint	Fuyu-8B
LOVA3-llava-v1.5-7b-gemini	7B	checkpoint	Gemini-1.5-Flash
LOVA3-llava-v1.5-phi1.5-baseline	1.5B	checkpoint	-
LOVA3-llava-v1.5-phi1.5-fuyu	1.5B	checkpoint	Fuyu-8B
LOVA3-llava-v1.5-phi1.5-gemini	1.5B	checkpoint	Gemini-1.5-Flash

Download from huggingface:

git clone https://huggingface.co/hhenryz/LOVA3-llava-v1.5-7b

Data

Data Json

Training Data: Mixed_VQA_GenQA_EvalQA_1.5M.jsonl.
EvalQABench Data: EvalQABench

Image Datasets

Please download the images from constituting datasets:

COCO: train2014
GQA: images
OCR-VQA: download script, we save all files as .jpg
AOKVQA: download script
TextVQA: train_val_images
VisualGenome: part1, part2
LLaVA-Instruct: huggingface

💃 Evaluation

Download LOVA3-llava-v1.5-7b under the folder checkpoints.
Download the CLIP vision encoder clip-vit-large-patch14-336 under the folder checkpoints.
Run the evaluation scripts under the folder scripts/v1_5/eval. There are 12 multimodal datasets and benchmarks awaiting evaluation.

Take VizWiz as an example, the running command is as follows:

modelname=LOVA3-llava-v1.5-7b

python -m llava.eval.model_vqa_loader \
    --model-path checkpoints/$modelname \
    --question-file ./playground/data/eval/vizwiz/llava_test.jsonl \
    --image-folder /yourpath/vizwiz/test/ \
    --answers-file ./playground/data/eval/vizwiz/answers/$modelname.jsonl \
    --temperature 0 \
    --conv-mode vicuna_v1

python scripts/convert_vizwiz_for_submission.py \
    --annotation-file ./playground/data/eval/vizwiz/llava_test.jsonl \
    --result-file ./playground/data/eval/vizwiz/answers/$modelname.jsonl \
    --result-upload-file ./playground/data/eval/vizwiz/answers_upload/$modelname.json

Training

Download the pretrained MLP adapter weights llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 from and put it under the folder checkpoints.
Download the model weight clip-vit-large-patch14-336 under the folder checkpoints.
Download the model weight vicuna-7b-v1.5 under the folder checkpoints.
Download the training data Mixed_VQA_GenQA_EvalQA_1.5M.jsonl under the folder data.
Run the training script.

bash scripts/v1_5/finetune.sh

🙏 Acknowledgement

LLaVA: The codebase we built upon.
LAVIS: We download some datasets from its scripts.

🎓 Citation

If you find LOVA3 useful, please cite using this BibTeX:

@misc{zhao2024lova3learningvisualquestion,
      title={LOVA3: Learning to Visual Question Answering, Asking and Assessment}, 
      author={Henry Hengyuan Zhao and Pan Zhou and Difei Gao and Zechen Bai and Mike Zheng Shou},
      year={2024},
      eprint={2405.14974},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.14974}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOVA3: Learning to Visual Question Answering, Asking and Assessment

📢 Update

⚒️ Install

Model weight

Data

Data Json

Image Datasets

💃 Evaluation

Training

🙏 Acknowledgement

🎓 Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
EvalQABench		EvalQABench
assets		assets
checkpoints		checkpoints
data		data
llava		llava
playground/data		playground/data
scripts		scripts
.editorconfig		.editorconfig
README.md		README.md
pyproject.toml		pyproject.toml

showlab/LOVA3

Folders and files

Latest commit

History

Repository files navigation

LOVA3: Learning to Visual Question Answering, Asking and Assessment

📢 Update

⚒️ Install

Model weight

Data

Data Json

Image Datasets

💃 Evaluation

Training

🙏 Acknowledgement

🎓 Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages