Image description method based on residual learning and dual-mode CAE

QIU Yicheng; YANG Lishen

Optical Technique, Volume. 47, Issue 1, 93(2021)

QIU Yicheng^1、* and YANG Lishen²

Author Affiliations

¹[in Chinese]

²[in Chinese]

show less

Abstract Get PDF(in Chinese)

In view of the problems existing in the traditional image description methods, such as the accuracy of extracting key information is not high and the description is not accurate, an image description method combining residual learning and dual-mode CAE is proposed. Firstly, a new dual-mode structure is proposed, which includes two inputs of image and text, as well as encoding, hiding layer interaction, decoding and other processing links to complete the text description of the input image. Then, residual learning is added to the classical convolution auto-encoder (CAE), and the convolution layer of CAE forms the residual neural network (DRN), which increases the learning depth and improves the accuracy of the method. Finally, the hidden layer of text and image is cross reconstructed to minimize the loss function, and the relationship between image and text is trained to realize the description of image. Using COCO and Flickr30k datasets to carry out qualitative and quantitative simulation experiments on the proposed method, the results demonstrate the effectiveness of the proposed method. Compared with other methods, the evaluation index Med r is the lowest, and R@K(K=1,5,10) was the highest, and the operation time is only 0.183s, which can describe the image more accurately than other methods.

Keywords

bimodal CAE cross reconstruction deep residual network image description minimizing loss function residual learning

Tools

Get Citation

Copy Citation Text

QIU Yicheng, YANG Lishen. Image description method based on residual learning and dual-mode CAE[J]. Optical Technique, 2021, 47(1): 93

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites