Optical Technique, Volume. 47, Issue 1, 93(2021)
Image description method based on residual learning and dual-mode CAE
In view of the problems existing in the traditional image description methods, such as the accuracy of extracting key information is not high and the description is not accurate, an image description method combining residual learning and dual-mode CAE is proposed. Firstly, a new dual-mode structure is proposed, which includes two inputs of image and text, as well as encoding, hiding layer interaction, decoding and other processing links to complete the text description of the input image. Then, residual learning is added to the classical convolution auto-encoder (CAE), and the convolution layer of CAE forms the residual neural network (DRN), which increases the learning depth and improves the accuracy of the method. Finally, the hidden layer of text and image is cross reconstructed to minimize the loss function, and the relationship between image and text is trained to realize the description of image. Using COCO and Flickr30k datasets to carry out qualitative and quantitative simulation experiments on the proposed method, the results demonstrate the effectiveness of the proposed method. Compared with other methods, the evaluation index Med r is the lowest, and R@K(K=1,5,10) was the highest, and the operation time is only 0.183s, which can describe the image more accurately than other methods.
Get Citation
Copy Citation Text
QIU Yicheng, YANG Lishen. Image description method based on residual learning and dual-mode CAE[J]. Optical Technique, 2021, 47(1): 93