Fusion layer attention for image-text matching

D Wang, L Wang, S Song, G Huang, Y Guo, S Cheng… - Neurocomputing, 2021 - Elsevier
D Wang, L Wang, S Song, G Huang, Y Guo, S Cheng, N Ao, A Du
Neurocomputing, 2021Elsevier
Image-text matching aims to find the relationship between image and text data and to
establish a connection between them. The main challenge of image-text matching is the fact
that images and texts have different data distributions and feature representations. Current
methods for image-text matching fall into two basic types: methods that map image and text
data into a common space and then use distance measurements and methods that treat
image-text matching as a classification problem. In both cases, the two data modes used are …
Abstract
Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods.
Elsevier
Showing the best result for this search. See all results