This repo contains assets in our paper What makes for Good Visual Tokenizers for Large Language Models?
We provide related details in gvt.
We provide the Object Counting (OC) and Multi-Class Identification (MCI) on MS-COCO and VCR datasets in GVTBench.
Our work is built on VLMo LAVIS EVA Vicuna.
Thanks for their great work!
If you find this work useful, please cite:
@misc{wang2023gvt,
title={What Makes for Good Visual Tokenizers for Large Language Models?},
author={Guangzhi Wang and Yixiao Ge and Xiaohan Ding and Mohan Kankanhalli and Ying Shan},
year={2023},
eprint={2305.12223},
archivePrefix={arXiv},
primaryClass={cs.CV}
}