Fine-Grained Image Classification Based on Feature Fusion and Ensemble Learning

Fine-grained image classification aims to recognize subcategories within a given superclass accurately; however, it is faced with challenges of large intra-class differences, small inter-class differences, and limited training samples. Most current methods are improved based on Vision Transformer with the goal of enhancing classification performance. However, the following issues occur: ignoring the complementary information of classification tokens from different layers leads to incomplete global feature extraction, inconsistent performance of different heads in multi-head self-attention mechanism leads to inaccurate part localization, and limited training samples are prone to overfitting. In this study, a fine-grained image classification network based on feature fusion and ensemble learning is proposed to address the above issues. The network consists of three modules: the multi-level feature fusion module integrates complementary information to obtain more complete global features, the multi-expert part voting module votes for part tokens through ensemble learning to enhance the representation ability of part features, the attention-guided mixup augmentation module alleviates the overfitting issue and improves the classification accuracy. The classification accuracy on CUB-200-2011, Stanford Dogs, NABirds, and IP102 datasets is 91.92%, 93.10%, 90.98%, and 76.21%, respectively, with improvements of 1.42, 1.50, 1.08, and 2.81 percentage points, respectively, compared to the original Vision Transformer model, performing better than other compared fine-grained image classification methods.

AI Video Guide

AI Picture Guide

AI One Sentence

AI Short Abstract

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

ensemble learning feature fusion fine-grained image classification mixup augmentation Vision Transformer

Tools

Get Citation

Copy Citation Text

Wenli Zhang, Wei Song. Fine-Grained Image Classification Based on Feature Fusion and Ensemble Learning[J]. Laser & Optoelectronics Progress, 2024, 61(22): 2237010

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites