Journal of Optoelectronics · Laser, Volume. 35, Issue 9, 925(2024)

Cross-modal image and text retrieval based on graph convolution and multi-head attention

HUA Chunjian1,2, ZHANG Hongtu1,2, JIANG Yi1,2, YU Jianfeng1,2, and CHEN Ying3
Author Affiliations
  • 1School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • 2Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology, Wuxi, Jiangsu 214122, China
  • 3School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    References(11)

    [3] [3] WANG H, SAHOO D, LIU C, et al. Cross-modal food retrieval: learning a joint embedding of food images and recipes with semantic consistency and attention mechanism[J]. IEEE Transactions on Multimedia, 2021, 24(5): 2515-2525.

    [6] [6] LIU Y, GUO Y Y, FANG J, et al. A survey of research on deep learning cross-modal image text retrieval[J]. Computer Science and Exploration, 2022, 16(3): 489-511.

    [7] [7] KANG J, LIU W. A cross-modal retrieval method for intelligent matching of decoration cases[J]. CAAI Transactions on Intelligent Systems, 2022, 17(4): 714-720.

    [9] [9] WEI Y C, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2016, 47(2): 449-460.

    [10] [10] ZHANG C, SONG J, ZHU X, et al. HCMSI: Hybrid cross-modal similarity learning for cross-modal retrieval[J]. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2021, 17(1s): 1-22.

    [11] [11] YU H, MA R, SU M, et al. A novel deep translated attention hashing for cross-modal retrieval[J]. Multimedia Tools and Applications, 2022, 81(18): 26443-26461.

    [12] [12] XU B B, CEN K Y, HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.

    [13] [13] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.

    [14] [14] LI K, ZHANG Y, LI K, et al. Visual semantic reasoning for image-text matching[C]//2019 IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, South Korea. Los Alamitos: IEEE Computer Society Press, 2019:4653-4661.

    [15] [15] BIANCHI F M, GRATTAROLA D, LIVI L, et al. Graph neural networks with convolutional ARMA filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3496-3507.

    [16] [16] ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. Los Alamitos: IEEE Computer Society Press, 2019:10386-10395.

    Tools

    Get Citation

    Copy Citation Text

    HUA Chunjian, ZHANG Hongtu, JIANG Yi, YU Jianfeng, CHEN Ying. Cross-modal image and text retrieval based on graph convolution and multi-head attention[J]. Journal of Optoelectronics · Laser, 2024, 35(9): 925

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Feb. 3, 2023

    Accepted: Dec. 20, 2024

    Published Online: Dec. 20, 2024

    The Author Email:

    DOI:10.16136/j.joel.2024.09.0025

    Topics