Journal of Optoelectronics · Laser, Volume. 35, Issue 9, 925(2024)

Cross-modal image and text retrieval based on graph convolution and multi-head attention

HUA Chunjian1,2, ZHANG Hongtu1,2, JIANG Yi1,2, YU Jianfeng1,2, and CHEN Ying3
Author Affiliations
  • 1School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • 2Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology, Wuxi, Jiangsu 214122, China
  • 3School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less

    Aiming at the problem that the existing cross-modal retrieval methods are difficult to measure the weight of data at each node, and there are limitations in mining local consistency within modalities, a cross-modal image and text retrieval method based on multi-head attention mechanism is proposed. Firstly, a single image and text sample serves as an independent node when constructing the modal diagram, and graph convolution is used to extract the interaction information between each sample to improve the local consistency in different modal data. Then, attention mechanism is introduced into graph convolution to adaptively learn the weight coefficients of each neighboring node, thereby distinguishing the influence of different neighboring nodes on the central node. Finally, a multi-head attention layer with weight parameters is constructed to fully learn multiple sets of related features between nodes. Compared with the existing 8 methods, the mAP values obtained by this method in experiments on the Wikipedia dataset and Pascal Sentence dataset increase by 2.6% to 42.5% and 3.3% to 54.3%, respectively.

    Tools

    Get Citation

    Copy Citation Text

    HUA Chunjian, ZHANG Hongtu, JIANG Yi, YU Jianfeng, CHEN Ying. Cross-modal image and text retrieval based on graph convolution and multi-head attention[J]. Journal of Optoelectronics · Laser, 2024, 35(9): 925

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Feb. 3, 2023

    Accepted: Dec. 20, 2024

    Published Online: Dec. 20, 2024

    The Author Email:

    DOI:10.16136/j.joel.2024.09.0025

    Topics