Computer Applications and Software, Volume. 42, Issue 4, 208(2025)

A CHINESE IMAGE CAPTIONING METHOD BASED ON FUSION ENCODER AND VISUAL KEYWORD SEARCH

Meng Fancong1, Xu Wei1, Li Haibo1, Wu Min1, Zheng Junjie1, and Chen Xing2
Author Affiliations
  • 1East China Yixing Pumped Storage Power Co., Ltd., Wuxi 214200, Jiangsu, China
  • 2College of Computer and Information, Hohai University, Nanjing 211100, Jiangsu, China
  • show less

    Aimed at the problem that the existing image caption models lack attention to the local details of an image and tend to give general description, a Chinese image caption method combining encoder and visual keyword search is proposed. A fusion encoder was constructed, and the local and global features of an image were extracted simultaneously in a convolutional neural network (CNN) to enrich the semantic information of image features in long short-term memory (LSTM) decoding stage. Aimed at the problem of general expression, the image retrieval method based on convolutional neural network was used to find the potential visual words, and was integrated into the word vector generation process in the decoding stage. Reinforcement learning mechanism was introduced to optimize the CIDEr evaluation index at the sentence level to improve the lexical diversity of image description. Experimental results verify the effectiveness of the proposed method.

    Tools

    Get Citation

    Copy Citation Text

    Meng Fancong, Xu Wei, Li Haibo, Wu Min, Zheng Junjie, Chen Xing. A CHINESE IMAGE CAPTIONING METHOD BASED ON FUSION ENCODER AND VISUAL KEYWORD SEARCH[J]. Computer Applications and Software, 2025, 42(4): 208

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Nov. 15, 2021

    Accepted: Aug. 25, 2025

    Published Online: Aug. 25, 2025

    The Author Email:

    DOI:10.3969/j.issn.1000-386x.2025.04.030

    Topics