Journal of Optoelectronics · Laser, Volume. 35, Issue 9, 925(2024)
Cross-modal image and text retrieval based on graph convolution and multi-head attention
[3] [3] WANG H, SAHOO D, LIU C, et al. Cross-modal food retrieval: learning a joint embedding of food images and recipes with semantic consistency and attention mechanism[J]. IEEE Transactions on Multimedia, 2021, 24(5): 2515-2525.
[6] [6] LIU Y, GUO Y Y, FANG J, et al. A survey of research on deep learning cross-modal image text retrieval[J]. Computer Science and Exploration, 2022, 16(3): 489-511.
[7] [7] KANG J, LIU W. A cross-modal retrieval method for intelligent matching of decoration cases[J]. CAAI Transactions on Intelligent Systems, 2022, 17(4): 714-720.
[9] [9] WEI Y C, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2016, 47(2): 449-460.
[10] [10] ZHANG C, SONG J, ZHU X, et al. HCMSI: Hybrid cross-modal similarity learning for cross-modal retrieval[J]. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2021, 17(1s): 1-22.
[11] [11] YU H, MA R, SU M, et al. A novel deep translated attention hashing for cross-modal retrieval[J]. Multimedia Tools and Applications, 2022, 81(18): 26443-26461.
[12] [12] XU B B, CEN K Y, HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.
[13] [13] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
[14] [14] LI K, ZHANG Y, LI K, et al. Visual semantic reasoning for image-text matching[C]//2019 IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, South Korea. Los Alamitos: IEEE Computer Society Press, 2019:4653-4661.
[15] [15] BIANCHI F M, GRATTAROLA D, LIVI L, et al. Graph neural networks with convolutional ARMA filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3496-3507.
[16] [16] ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. Los Alamitos: IEEE Computer Society Press, 2019:10386-10395.
Get Citation
Copy Citation Text
HUA Chunjian, ZHANG Hongtu, JIANG Yi, YU Jianfeng, CHEN Ying. Cross-modal image and text retrieval based on graph convolution and multi-head attention[J]. Journal of Optoelectronics · Laser, 2024, 35(9): 925
Category:
Received: Feb. 3, 2023
Accepted: Dec. 20, 2024
Published Online: Dec. 20, 2024
The Author Email: