Opto-Electronic Engineering, Volume. 46, Issue 9, 180468(2019)
Multi-label classification based on attention mechanism and semantic dependencies
[1] [1] Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos[C]//Proceedings 9th IEEE International Conference on Computer Vision, 2003: 1470–1477.
[2] [2] Wang R G, Ding K, Yang J, et al. Image classification based on bag of visual words model with triangle constraint[J]. Journal of Software, 2017, 28(7): 1847-1861.
[3] [3] Huang Q H, Liu Z. Multiple-hyperplane SVMs algorithm in image semantic classification[J]. Opto-Electronic Engineering, 2007, 34(8): 99–104.
[4] [4] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27.
[5] [5] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5–32.
[6] [6] Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification[C]//Proceedings of the 12th International Conference on Computer Vision, 2009: 237–244.
[7] [7] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110.
[8] [8] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886–893.
[9] [9] Ojala T, Pietik inen M, Harwood D. A comparative study of texture measures with classification based on featured distributions[J]. Pattern Recognition, 1996, 29(1): 51–59.
[10] [10] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556[cs.CV], 2015.
[11] [11] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Computer Vision and Pattern Recognition, 2017: 2261–2269.
[12] [12] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
[13] [13] Razavian A S, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 512–519.
[14] [14] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of 2009 IEEE Computer Vision and Pattern Recognition, 2009: 248–255.
[15] [15] Wei Y C, Xia W, Lin M, et al. HCP: a flexible CNN framework for multi-label image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1901–1907.
[16] [16] Cheng M M, Zhang Z M, Lin W Y, et al. BING: binarized normed gradients for objectness estimation at 300fps[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3286–3293.
[17] [17] Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2285–2294.
[18] [18] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735–1780.
[19] [19] Zhang J J, Wu Q, Shen C H, et al. Multilabel image classification with regional latent semantic dependencies[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801–2813.
[20] [20] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 448–456.
[21] [21] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011: 315–323.
[22] [22] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention[J]. arXiv:1412.7755[cs.LG], 2015.
[23] [23] Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention[J]. arXiv:1502.03044 [cs.LG], 2015.
[24] [24] Wang Z X, Chen T S, Li G B, et al. Multi-label image recognition by recurrently discovering attentional regions[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 464–472.
[25] [25] Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338.
[26] [26] Srivastava N, Salakhutdinov R. Learning representations for multimodal data with deep belief nets[C]//Proceedings of 2012 ICML Representation Learning Workshop, 2012: 79.
[27] [27] Wang R G, Xie Y F, Yang J, et al. Large scale automatic image annotation based on convolutional neural network[J]. Journal of Visual Communication and Image Representation, 2017, 49: 213–224.
[28] [28] Li Y N, Yeh M C. Learning image conditioned label space for multilabel classification[J]. arXiv:1802.07460[cs.CV], 2018.
Get Citation
Copy Citation Text
Xue Lixia, Jiang Di, Wang Ronggui, Yang Juan. Multi-label classification based on attention mechanism and semantic dependencies[J]. Opto-Electronic Engineering, 2019, 46(9): 180468
Category: Article
Received: Sep. 10, 2018
Accepted: --
Published Online: Oct. 14, 2019
The Author Email: Juan Yang (yangjuan6985@163.com)