This is a python and keras implementation of the VIS+LSTM visual question answering model.
Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.
Strong baseline for visual question answering
Implementation for "Large-scale Pretraining for Visual Dialog"
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
毕业设计: 基于深度学习的视觉问答