Optical Technique, Volume. 47, Issue 1, 93(2021)

Image description method based on residual learning and dual-mode CAE

QIU Yicheng1、* and YANG Lishen2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less

    In view of the problems existing in the traditional image description methods, such as the accuracy of extracting key information is not high and the description is not accurate, an image description method combining residual learning and dual-mode CAE is proposed. Firstly, a new dual-mode structure is proposed, which includes two inputs of image and text, as well as encoding, hiding layer interaction, decoding and other processing links to complete the text description of the input image. Then, residual learning is added to the classical convolution auto-encoder (CAE), and the convolution layer of CAE forms the residual neural network (DRN), which increases the learning depth and improves the accuracy of the method. Finally, the hidden layer of text and image is cross reconstructed to minimize the loss function, and the relationship between image and text is trained to realize the description of image. Using COCO and Flickr30k datasets to carry out qualitative and quantitative simulation experiments on the proposed method, the results demonstrate the effectiveness of the proposed method. Compared with other methods, the evaluation index Med r is the lowest, and R@K(K=1,5,10) was the highest, and the operation time is only 0.183s, which can describe the image more accurately than other methods.

    Tools

    Get Citation

    Copy Citation Text

    QIU Yicheng, YANG Lishen. Image description method based on residual learning and dual-mode CAE[J]. Optical Technique, 2021, 47(1): 93

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Aug. 4, 2020

    Accepted: --

    Published Online: Apr. 12, 2021

    The Author Email: Yicheng QIU (qiuyicheng@163.com)

    DOI:

    Topics