In recent years,the rapid advancement of infrared detection and imaging technology has led to an expanding application scope for infrared images[
Journal of Infrared and Millimeter Waves, Volume. 44, Issue 1, 96(2025)
Infrared aircraft few-shot classification method based on cross-correlation network
In response to the scarcity of infrared aircraft samples and the tendency of traditional deep learning to overfit, a few-shot infrared aircraft classification method based on cross-correlation networks is proposed. This method combines two core modules: a simple parameter-free self-attention and cross-attention. By analyzing the self-correlation and cross-correlation between support images and query images, it achieves effective classification of infrared aircraft under few-shot conditions. The proposed cross-correlation network integrates these two modules and is trained in an end-to-end manner. The simple parameter-free self-attention is responsible for extracting the internal structure of the image while the cross-attention can calculate the cross-correlation between images further extracting and fusing the features between images. Compared with existing few-shot infrared target classification models, this model focuses on the geometric structure and thermal texture information of infrared images by modeling the semantic relevance between the features of the support set and query set, thus better attending to the target objects. Experimental results show that this method outperforms existing infrared aircraft classification methods in various classification tasks, with the highest classification accuracy improvement exceeding 3%. In addition, ablation experiments and comparative experiments also prove the effectiveness of the method.
Introduction
In recent years,the rapid advancement of infrared detection and imaging technology has led to an expanding application scope for infrared images[
In contrast,the human visual system has the remarkable ability to rapidly form cognitive frameworks for new entities based on a few examples[
Infrared images possess unique characteristics,such as low contrast and low signal-to-noise ratio. Moreover,apart from the target objects,infrared images may also contain various background interference,such as buildings and clouds. Therefore,designing a network model that can focus more on the target objects in infrared images under the constraint of extremely limited samples is crucial for our research. Recent advancements in few-shot learning have seen widespread application of meta-learning and transfer learning. Chen[
In the task of few shot classification,test images in the query set come from novel classes,making it challenging for the extracted features to focus on the target objects[
We propose a few-shot infrared aircraft classification method based on cross-correlation networks,which integrates two crucial attention modules and is trained in an end-to-end manner. Firstly,by utilizing the parameter-free self-attention module(SAM),we extract the intra-correlation within each image to acquire feature representations in both spatial and channel dimensions. Subsequently,the cross-attention module(CA) is employed to generate cross-attention between support and query images,thereby enhancing the model's generalization capability. By efficiently fusing features within and between images with minimal parameters,the model reduces computational complexity. In contrast to current models for few-shot infrared aircraft classification,our approach enhances the focus on the infrared imagery’s geometric and textural details. It achieves this by establishing a semantic connection between the feature sets of the support and query samples,thereby improving the model’s ability to accurately identify target objects. The proposed model receives robust support and validation from classification experiments and ablation studies,all achieved without the introduction of excessive parameters.
1 Method
In this section,we provide a detailed introduction to the Cross-Correlation Network(CCNet) proposed in this paper for few-shot infrared aircraft classification. The overall architecture of CCNet is illustrated in
Figure 1.The overall architecture of CCNet model
1.1 Parameter-free self-attention
Attention mechanisms allocate different weights to the importance of key information contained within channels,thereby enhancing the network's focus on important information. Common attention mechanisms are typically composed of convolutional layers,pooling layers,activation functions,etc.,introducing additional parameters to the network. To improve network performance without increasing computational complexity,we introduce a simple,parameter-free attention mechanism module called SAM[
Figure 2.Parameter-free self-attention model
Building upon this,SAM defines an energy function to measure the difference between each feature and other features,thereby evaluating the importance of each feature. The definition of the energy function is as
where
Where and
represent the mean and variance of all neurons in the channel,λ is a hyperparameter used for balancing.
where E represents the grouping of all
1.2 Cross Attention
In contrast to previous methods that independently extract features from support sets and query samples,we introduce a cross attention module to compute the cross-correlation between support and query images. The CA module enhances the model's focus on the target object by modeling the semantic relevance between class features and query features,thereby improving the efficiency and accuracy of the subsequent matching process. The cross attention module first takes the self-correlation representations of the support set and query samples(
Figure 3.The architecture of cross attention
In order to reduce computational complexity and obtain a more effective feature representation,we first employ a convolutional layer to decrease the channel dimensions of and from to ,resulting in the outputs and . Subsequently,the cross-correlation representation of and is computed using
where
In the process of fine-grained classification of infrared aircraft,due to the similarity of some target shapes,the cross-correlation tensor may contain unreliable correlations. Therefore,we adopt a convolution matching process to obtain a more reliable cross-correlation representation. Specifically,we use four-dimensional convolution,which enhances the expression ability of target features and improves the accuracy of classification by analyzing the consistency of adjacent matches in the four-dimensional space and achieving geometric matching on the tensor[
After obtaining the reliable cross-correlation tensor,it is necessary to generate the common attention maps
where
1.3 Loss function
Unlike many recent few-shot learning methods that adopt a 'pre-training + fine-tuning ' two-stage training scheme,we propose an end-to-end training strategy for CCNet. This strategy jointly trains the designed modules and the backbone network by combining the metric loss
where
The global classification loss
where
where λ is the weight that balances the effects of different losses. By optimizing the overall loss L,the network can be trained end-to-end using the gradient descent algorithm.
2 Experiments
2.1 Experimental environment and data source
All experiments were conducted in a hardware environment based on the Intel i7 13700 processor,NVIDIA GTX4080 graphics card,and DDR4 64G memory,as well as a software environment with the Win10 system and Pytorch deep learning framework. During the training phase,we adopted a training strategy based on N-way K-shot meta-tasks. Specifically,in each training cycle,N categories are randomly selected from the training data,and then K labeled samples are selected from each category to construct the support set. Subsequently,a certain number of samples are randomly selected from the other samples of these N categories,and these samples constitute the query set. Finally,the model predicts the category labels of the query samples. In the validation and testing phases,we still use the aforementioned meta-task form for evaluation. It should be noted that the data in the validation set,test set,and training set all come from different categories,which means that .
In order to validate the effectiveness of the model proposed in this study,we conducted experiments using two datasets: the miniImageNet[
Figure 4.(a) Samples of miniImageNet dataset;(b) Samples of miniInfra dataset
In this study,we employ ResNet12[
2.2 Few-shot classification based on the miniImageNet dataset
The miniImageNet dataset is composed of 100 categories,each containing 600 images,totaling 60,000 visible light images. Following the partitioning standards of previous literature[
During the experiments on the miniImageNet dataset,we use the Stochastic Gradient Descent(SGD) optimizer for 80 epochs of training,each epoch consisting of 300 meta-tasks. The initial learning rate is set to 0.1,and a learning rate decay strategy is adopted. At the 60th and 70th epochs,the learning rate is multiplied by a decay factor of 0.05. In the experiments,the temperature factor τ of the metric loss function is set to 0.2,and the hyperparameter λ for balancing the loss weight is set to 0.25.
|
2.3 Aircraft classification based on the miniInfra dataset
The miniInfra dataset comprises 33 classes of terrestrial targets and 8 classes of aircraft targets. Terrestrial targets encompass various categories such as buildings,bicycles,pedestrians,cars,animals,and boats,with each class containing 100 to 200 infrared images. The 8 classes of aircraft targets include trainer aircraft,civil aviation aircraft,three types of helicopters(Z-8,Z-9,Z-15),and three types of jet aircraft(J-7,J-8,J-11),with each class containing 40 to 80 images. The granularity of aircraft target classification is finer than that of terrestrial targets.
Given the severe shortage of infrared aircraft data and to validate the model's ability to recognize fine-grained targets,we selected 25 types of ground targets as the training set,8 types of ground targets as the validation set,and finally select 8 types of aircraft targets as the test set. The experiments include two standard few shot classification tasks: 5-way 1-shot and 5-way 5-shot. Considering that there are 8 types of aircraft,we added two specific classification tasks: 8-way 1-shot and 8-way 5-shot to test the model's generalization ability for few shot infrared aircraft in a real environment. Consistent with the experimental setup in the miniImageNet dataset,the experiment still uses the SGD optimizer and adopts a learning rate decay strategy. Since the size of the miniInfra dataset is much smaller than the miniImageNet dataset,to prevent overfitting,in the infrared aircraft classification task,we adjusted the number of training epochs to 20 and set the initial learning rate to 0.01.
We compared the experimental results with the existing infrared aircraft classification methods[
|
2.4 Ablation experiments
To delve deeper into the impact of the core modules in CCNet,we conducted a series of ablation experiments on the miniImageNet and miniInfra datasets. These experiments included scenarios where two core modules were missing simultaneously,as well as cases where only one of the modules was used independently. We constructed a baseline model that only contains the backbone network and does not include any additional modules,to evaluate the effectiveness of the core modules in CCNet. We carried out 5-way 1-shot ablation experiments on the miniImageNet and miniInfra datasets. As can be seen from
Figure 5.(a) Training and validation accuracy curves of the baseline model and CCNet model on miniImageNet dataset;(b) Training and validation accuracy curves of the baseline model and CCNet model on miniInfra dataset
In this study,further ablation experiments are conducted on the 5-way 1-shot tasks of two datasets to individually validate the effectiveness of the SAM module and the CA module. When only the CA module is used,the basic representation Zq is taken as input; when only the SAM module is used,its output is directly utilized for classification. The results of the ablation experiments are presented in
Figure 6.Ablation experiment results on miniImageNet and miniInfra dataset
We also present the results of class activation mapping(CAM) feature visualization using our CCNet,encompassing both visible and infrared images,as illustrated in
Figure 7.The class activation mapping (CAM) feature visualization of CCNet
2.5 Performance and parameter comparison of different attention modules
In this study,we replaced different attention modules in the proposed CCNet network model to compare the accuracy and parameter scale of the proposed modules with existing attention modules. Firstly,we evaluated the self-attention and cross-attention methods based on feature similarity,which focus on the correlation of the image spatial structure features.
As shown in
|
3 Conclusion
In this study,we have proposed a few-shot infrared aircraft classification method based on the cross-correlation network,which can effectively solve the classification problem of infrared aircraft when the number of samples is severely insufficient. In the research process,in order to reduce model parameters and specifically target the structural features of infrared aircraft target images,we introduce a parameter-free self-attention mechanism to analyze the self-correlation within images. Meanwhile,we design a cross-attention mechanism to investigate the self-correlation between images,which effectively enhances the model's capability to extract features from infrared images. The experimental results show that our method significantly outperforms existing methods in aerial target classification accuracy on the infrared dataset,with an improvement of up to 3% in classification accuracy for specific tasks. Furthermore,the tests on the public miniImageNet dataset and the ablation experiments further verify the effectiveness and contributions of the proposed modules. The method proposed in this paper not only has broad application potential in aircraft detection,but also has great application value in civilian fields where data is scarce,such as medical. But at the same time in the research tasks of this paper it only involves the single task of aircraft classification. However,in actual application scenarios of the infrared detection system it involves a series of complex tasks such as target detection target recognition and target tracking. Therefore,how to deploy the few-shot model to these actual application scenarios and maintain good performance under multiple tasks will be the focus of the next stage of work.
[2] R M Chen, S J Liu, Z Miao et al. Infrared aircraft few-shot classification method based on meta learning. Journal of Infrared and Millimeter Waves, 40, 554-560(2021).
[5] X Luo, H Wu, J Zhang et al. A closer look at few-shot classification again, 202, 23103-23123(2023).
[8] R Hou, H Chang, B MA et al. Cross Attention Network for Few-shot Classification, 32(2019).
[19] O Vinyals, C Blundell, T Lillicrap et al. Matching Networks for One Shot Learning, 29(2016).
[22] S Ravi, H Larochelle. Optimization as a model for few-shot learning(2017).
[24] R Hou, H Chang, B MA et al. Cross Attention Network for Few-shot Classification, 32(2019).
[27] S Laenen, L Bertinetto. On Episodes, Prototypical Networks, and Few-Shot Learning, 34, 24581-24592(2021).
[33] P Ramachandran, N Parmar, A Vaswani et al. Stand-Alone Self-Attention in Vision Models, 32(2019).
Get Citation
Copy Citation Text
Zhen HUANG, Yong ZHANG, Jin-Fu GONG. Infrared aircraft few-shot classification method based on cross-correlation network[J]. Journal of Infrared and Millimeter Waves, 2025, 44(1): 96
Category: Infrared Optoelectronic System and Application Technology
Received: Mar. 29, 2024
Accepted: --
Published Online: Mar. 5, 2025
The Author Email: ZHANG Yong (zybxy@sina.com)