Journal of Optoelectronics · Laser, Volume. 36, Issue 1, 27(2025)
Research on image captioning generation method of double information flow based on ECA-Net
[1] [1] BAI S, AN S. A survey on automatic image caption generation[J]. Neurocomputing, 2018, 311:291-304.
[2] [2] FARHADI A, HEJRATI M, SADEGHI M A, et al. Every picture tells a story: Generating sentences from images[C]//European Conference on Computer Vision(ECCV), September 5-11, 2010, Heraklion, Crete, Greece. Singapore: Springer, 2010:15-29.
[3] [3] GUPTA A, VERMA Y, JAWAHAR C. Choosing linguistics over vision to describe images[C]//2012 AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. Menlo Park: AAAI, 2012, 26(1):606-612.
[4] [4] LI S M, KULKARNI G, BERG T, et al. Composing simple image descriptions using web-scale N-grams[C]//15th Conference on Computational Natural Language Learning, June 23-24, 2011, Portland, Oregon, USA. Stroudsburg: ACL, 2011:220-228.
[5] [5] KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//31st International Conference on Machine Learning, June 21-26, 2014, Beijing, China. New York: ACM, 2014:595-603.
[6] [6] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//32nd International Conference on Machine Learning, July 6-11, 2015, Lille, France. New York: ACM, 2015:2048-2057.
[7] [7] PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-24, 2022, New Orleans, LA, USA. New York: IEEE, 2022:815-825.
[8] [8] MA Y W, JI J Y, SUN X S, et al. Towards local visual modeling for image captioning[J]. Pattern Recognition, 2023, 138:109420.
[9] [9] GRAVES A. Generating sequences with recurrent neural networks[EB/OL]. (2013-08-04) [2023-06-18]. https://arxiv.org/abs/1308.0850.
[10] [10] WU M R, ZHANG X Y, SUN X S, et al. Difnet: Boosting visual information flow for image captioning[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-24, 2022, New Orleans, LA, USA. New York: IEEE, 2022:18020-18029.
[11] [11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, December 4-9, 2017, Long Beach, California, USA. Red Hook: Currant Associaes Inc. , 2017, 30:6000-6010.
[12] [12] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York: IEEE, 2020:11534-11542.
[13] [13] NAGRANI A, YANG S, ARNAB A, et al. Attention bottlenecks for multimodal fusion[C]//Advances in Neural Information Processing Systems, December 6-14, 2021, Red Hook: Currant Associaes Inc. , 2021, 34:14200-14213.
[14] [14] LUO R. A better variant of self-critical sequence training[EB/OL]. (2020-03-22) [2023-06-18]. https://arxiv.org/abs/2003.09971.
[15] [15] CHEN X, FANG H, LIN T Y, et al. Microsoft coco captions: Data collection and evaluation server[EB/OL]. (2015-04-01) [2023-06-18]. https://arxiv.org/abs/1504.00325.
[16] [16] KARPATHY A, JOHNSON J, FEI-FEI L, et al. Visualizing and understanding recurrent networks[EB/OL]. (2015-06-05) [2023-06-18]. https://arxiv.org/abs/1506.02078.
[17] [17] WANG E K, ZHANG X, WANG F, et al. Multilayer dense attention model for image caption[J]. IEEE Access, 2019, 7:66358-66368.
[18] [18] XIONG Y W, LIAO R J, ZHAO H S, et al. UPSNet: A unified panoptic segmentation network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 2019:8818-8826.
[19] [19] KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2014-12-22) [2023-06-18]. https://arxiv.org/abs/1412.6980.
[20] [20] FU H X, SONG G Q, WANG Y C. Improved YOLOv4 marine target detection combined with CBAM[J]. Symmetry, 2021, 13(4):623.
[21] [21] MA X, GUO J D, SANSOM A, et al. Spatial pyramid attention for deep convolutional neural networks[J]. IEEE Transactions on Multimedia, 2021, 23:3048-3058.
[22] [22] HU J, SHEN L, SUN G, et al. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 2018:7132-7141.
[23] [23] RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017:7008-7024.
[24] [24] PAN Y W, LI Y H, YAO T, et al. Bottom-up and top-down object inference networks for image captioning[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2023, 19(5):1-18.
[25] [25] CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory transformer for image captioning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York: IEEE, 2020:10578-10587.
[26] [26] PAN Y W, YAO T, LI Y H, et al. X-linear attention networks for image captioning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York: IEEE, 2020:10971-10980.
[27] [27] ZHANG X Y, SUN X S, LUO Y P, et al. RSTNet: Captioning with adaptive attention on visual and non-visual words[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 20-25, 2021, Nashville, TN, USA. New York: IEEE, 2021:15465-15474.
Get Citation
Copy Citation Text
LIU Zhongmin, SU Rong, HU Wenjin. Research on image captioning generation method of double information flow based on ECA-Net[J]. Journal of Optoelectronics · Laser, 2025, 36(1): 27
Category:
Received: Jun. 18, 2023
Accepted: Jan. 23, 2025
Published Online: Jan. 23, 2025
The Author Email: LIU Zhongmin (liuzhmx@163.com)