Cross-modal image and text retrieval based on graph convolution and multi-head attention

HUA Chunjian; ZHANG Hongtu; JIANG Yi; YU Jianfeng; CHEN Ying

doi:10.16136/j.joel.2024.09.0025

Journal of Optoelectronics · Laser, Volume. 35, Issue 9, 925(2024)

Cross-modal image and text retrieval based on graph convolution and multi-head attention

HUA Chunjian^1,2, ZHANG Hongtu^1,2, JIANG Yi^1,2, YU Jianfeng^1,2, and CHEN Ying³

Author Affiliations

¹School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China

²Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology, Wuxi, Jiangsu 214122, China

³School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China

show less

Abstract Get PDF(in Chinese)

References(11)

[3] [3] WANG H, SAHOO D, LIU C, et al. Cross-modal food retrieval: learning a joint embedding of food images and recipes with semantic consistency and attention mechanism[J]. IEEE Transactions on Multimedia, 2021, 24(5): 2515-2525.

[6] [6] LIU Y, GUO Y Y, FANG J, et al. A survey of research on deep learning cross-modal image text retrieval[J]. Computer Science and Exploration, 2022, 16(3): 489-511.

[7] [7] KANG J, LIU W. A cross-modal retrieval method for intelligent matching of decoration cases[J]. CAAI Transactions on Intelligent Systems, 2022, 17(4): 714-720.

[9] [9] WEI Y C, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2016, 47(2): 449-460.

[10] [10] ZHANG C, SONG J, ZHU X, et al. HCMSI: Hybrid cross-modal similarity learning for cross-modal retrieval[J]. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 2021, 17(1s): 1-22.

[11] [11] YU H, MA R, SU M, et al. A novel deep translated attention hashing for cross-modal retrieval[J]. Multimedia Tools and Applications, 2022, 81(18): 26443-26461.

[12] [12] XU B B, CEN K Y, HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.

[13] [13] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.

[14] [14] LI K, ZHANG Y, LI K, et al. Visual semantic reasoning for image-text matching[C]//2019 IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, South Korea. Los Alamitos: IEEE Computer Society Press, 2019:4653-4661.

[15] [15] BIANCHI F M, GRATTAROLA D, LIVI L, et al. Graph neural networks with convolutional ARMA filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3496-3507.

[16] [16] ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. Los Alamitos: IEEE Computer Society Press, 2019:10386-10395.

Tools

Get Citation

Copy Citation Text

HUA Chunjian, ZHANG Hongtu, JIANG Yi, YU Jianfeng, CHEN Ying. Cross-modal image and text retrieval based on graph convolution and multi-head attention[J]. Journal of Optoelectronics · Laser, 2024, 35(9): 925

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Feb. 3, 2023

Accepted: Dec. 20, 2024

Published Online: Dec. 20, 2024

The Author Email:

DOI:10.16136/j.joel.2024.09.0025

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology