Computer Applications and Software, Volume. 42, Issue 4, 208(2025)
A CHINESE IMAGE CAPTIONING METHOD BASED ON FUSION ENCODER AND VISUAL KEYWORD SEARCH
[1] [1] Li P J, Ma J, Gao S. Learning to summarize web image and text mutually[C]//2nd ACM International Conference on Multimedia Retrieval, 2012: 28-36.
[2] [2] Ordonez V, Kulkarni G, Berg T L. Im2Text: Describing images using 1 million captioned photographs[C]//25th Annual Conference on Neural Information Processing Systems, 2011: 1143-1151.
[3] [3] Mason R, Charniak E. Nonparametric method for data-driven image captioning[C]//52nd Annual Meeting of the Association for Computational Linguistics, 2014: 592-598.
[4] [4] Farhadi A, Hejrati S M, Sadeghi M A, et al. Every picture tells a story: Generating sentences from images[J]. Lecture Notes in Computer Science, 2010, 6314(1): 15-29.
[5] [5] Farhadi A, Endres I, Hoiem D, et al. Describing objects by their attributes[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2009: 1778-1785.
[6] [6] Li S M, Kulkarni G, Berg T L, et al. Composing simple image descriptions using web-scale N-grams[C]//15th Conference on Computational Natural Language Learning, 2011: 220 -228.
[7] [7] Vinyals O, Toshev A, Bengio S, et al. Show and Tell: A neural image caption generator[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, 7: 3156-3164.
[8] [8] Karpathy A, Li F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
[9] [9] Donahue J, Hendricks L A, Rohrbach M, et al. Long-term recurrent convolutional networks for visual recognition and description[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 677-691.
[11] [11] Mao J H, Xu W, Yang Y, et al. Deep captioning with multimodal recurrent neural networks (m-RNN)[EB]. arXiv: 1412.6632, 2014.
[12] [12] Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//32nd International Conference on International Conference on Machine, 2015: 2048-2057.
[13] [13] Lu J, Xiong C, Parikh D, et al. Knowing when to look: A-daptive attention via a visual sentinel for image captioning[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3242-3250.
[14] [14] Johnson J, Karpathy A, Li F. DenseCap: Fully convolutional localization networks for dense captioning[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4565-4574.
[15] [15] Lu J, Yang J W, Batra D, et al. Neural baby talk[EB]. arXiv: 1803.09845, 2018.
[16] [16] Anderson P, He X D, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 6077-6086.
[17] [17] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[18] [18] Li X R, Lan W Y, Dong J F, et al. Adding Chinese captions to images[C]//ACM on International Conference on Multimedia Retrieval, 2016: 271-275.
[19] [19] Wu J H, Zheng H, Zhao B, et al. AI challenger: A largescale dataset for going deeper in image understanding[EB]. arXiv: 1711.06475, 2017.
[23] [23] Zhu S Y, Ng I, Chen Z. Causal discovery with reinforcement learning[EB]. arXiv: 1906.04477, 2019.
[24] [24] Chen C, Mu S, Xiao W P, et al. Improving image captioning with conditional generative adversarial nets[C]//33rd AAAI Conference on Artificial Intelligence, 2019: 8142-8150.
[25] [25] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//40th Annual meeting of the Association for Computational Linguistics, 2002: 311-318.
[26] [26] Lin C Y. ROUGE: A package for automatic evaluation of summaries[C]//42nd Annual Meeting of the Association for Computational Linguistics, 2004: 74-81.
[27] [27] Vedantam R, Zitnick C L, Parikh D. CIDEr: Consensus-based image description evaluation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2015: 4566-4575.
[28] [28] Robertson S. Understanding inverse document frequency: On the oretical arguments for IDF[J]. Journal of Documentation, 2004, 60(5): 503-520.
Get Citation
Copy Citation Text
Meng Fancong, Xu Wei, Li Haibo, Wu Min, Zheng Junjie, Chen Xing. A CHINESE IMAGE CAPTIONING METHOD BASED ON FUSION ENCODER AND VISUAL KEYWORD SEARCH[J]. Computer Applications and Software, 2025, 42(4): 208
Category:
Received: Nov. 15, 2021
Accepted: Aug. 25, 2025
Published Online: Aug. 25, 2025
The Author Email: