Acta Optica Sinica, Volume. 44, Issue 24, 2430005(2024)
An Early Classification Algorithm for Small Sample Transient Source Based on Machine Learning
Transient sources play a crucial role in studying the origins of the universe and physical phenomena in extreme environments. One of the primary objectives of the SVOM mission is to detect target of opportunity (ToO) events, including electromagnetic counterparts of gravitational waves and other types of transients. Given their Rapid decay, millions of transient events are detected by sensors every night. Hence, a Rapid and accurate classification algorithm is essential for confirming their nature early on. Early classification not only aids in subsequent observational follow-ups but also in studying the physical properties and progenitor systems of transients. Currently, early photometric data of transients often consist of incomplete light curves, which poses a challenge for traditional classification algorithms that typically require complete data sets. Existing early classification algorithms rely heavily on large data sets, which may overlook transients with low occurrence rates or those undetected by current methods. Therefore, developing early classification algorithms tailored for small sample transients is necessary to improve detection efficiency.
We propose an early classification algorithm for small sample transient sources based on machine learning: temporal convolutional network (TCN) and eXtreme gradient boosting (XGBoost) combined with a weight module (TXW) algorithm. The algorithm utilizes a small sample metric learning method. Firstly, input data is converted into feature vectors, after which similarity scores for all classes are calculated by the classifier. The transient object is classified as the class with the highest score. The TCN module in the TXW algorithm extracts features from the photometric data of transients, while the XGBoost module calculates probability scores for each candidate class of transient objects. We propose a novel weighting algorithm in the weight module to reduce the noise in time-series photometric data from transient sources. This addresses issues where signal sources disappear prematurely and noise is mistaken for features. Experimental data consists of four types of open-source multi-band transient simulation data provided by the photometric LSST astronomical time-series classification challenge (PLAsTiCC): tidal disruption events (TDE), kilonovae (KN), type Ia supernovae (SNIa), and Type I super-luminous supernovae (SLSN-I). We use simulated photometric transient data from the g, r, and i bands in the PLAsTiCC dataset, as these bands align with ground-based telescope observation bands used in the SVOM mission. After preprocessing steps such as time correction, de-reddening, light curve fitting, and data augmentation, a suitable dataset is established for the models. We evaluate the performance of the TXW algorithm by comparing it with other classifiers—LSTM, transformer, Rapid, and TXW without the weight module—using the same testing set.
We compare the real-time classification accuracy results of different algorithms. As shown in Table 1, the TXW classification accuracy is 21.98 percent point higher than that of LSTM, 18.23 percent point higher than that of Transformer, 4.33 percent point higher than that of Rapid, and 0.81 percent point higher than that of the TXW algorithm without the weight module. These results demonstrate that the TXW algorithm offers high accuracy and strong noise resistance capabilities. We consider the results at 2 d post-trigger as the early epoch transient classification results, and those at 24 d post-trigger as the late epoch results. This paper uses confusion matrices, precision?recall (PR) curves, and receiver operating characteristic (ROC) curves as performance indicators for the algorithms. Figure 5 displays the confusion matrix, showing that the TXW results at 2 d and 24 d post-trigger are superior to those of Rapid. Additionally, the accuracy of the TXW algorithm at 2 d post-trigger exceeds 0.5. precision?recall curves and average precision (AP) values are presented in Fig. 6. The average AP of the TXW algorithm is 0.25 higher than that of Rapid at 2 d post-trigger, with TDE higher by 0.03, KN by 0.1, SNIa by 0.21, and SLSN-I by 0.16 compared to Rapid. At 24 d post-trigger, the average AP of the TXW algorithm is 0.17 higher than Rapid, with TDE higher by 0.02, KN by 0.03, SNIa by 0.09, and SLSN-I by 0.13 compared to Rapid. ROC curves and area under the curve (AUC) values are shown in Fig. 7. At 2 d post-trigger, the micro-average and macro-average AUC of the TXW algorithm are higher by 0.1 and 0.08 respectively, with TDE higher by 0.02, KN by 0.09, SNIa by 0.19, and SLSN-I by 0.09 compared to Rapid. At 24 d post-trigger, the micro-average is 0.04, the macro-average is 0.05, TDE is 0.04, KN is 0.02, SNIa is 0.1, and SLSN-I is 0.05 higher than Rapid. Figure 8 shows the AUC over time for the TXW and Rapid algorithms. Over time, both algorithms show improvement. However, after t>40, the AUC of the Rapid algorithm decreases due to noise influence, whereas the TXW algorithm mitigates noise effects. The maximum AUC of the Rapid algorithm is greater than 0.8, while that of the TXW algorithm exceeds 0.9. Overall, the TXW algorithm consistently outperforms the Rapid algorithm in both early and late epoch results, which showcases higher accuracy and better noise resistance, particularly beneficial for early classification of small sample transients.
We propose an early classification algorithm, TXW, for small sample transients. In the design of the TXW algorithm, the TCN has stronger feature extraction abilities compared to the GRU. The TXW algorithm not only possesses the advantages of the XGBoost algorithm, including high accuracy and strong robustness but also addresses the shortcomings of RF and XGBoost, which ignore correlations between attributes in datasets due to the TCN module. Additionally, the residual block in the algorithm resolves the issue of CNN overfitting. Due to the short time scale of the transients, we propose a new weighting formula to address the issue where noise from prematurely disappearing signal sources is misclassified as features. We compare the classification results of TXW with LSTM, transformer, Rapid, and TXW without the weight module. We also analyze the results using performance indicators such as accuracy, confusion matrix, PR curve, AP value, ROC curve, and AUC value. The results show that the TXW algorithm has high accuracy, strong robustness, and great anti-noise ability. The comprehensive performance of the TXW algorithm is better than that of the Rapid algorithm. The TXW algorithm contributes significantly to research on small sample transients.
Get Citation
Copy Citation Text
Mengci Li, Chengzhi Liu, Chao Wu, Zhe Kang, Shiyu Deng, Zhenwei Li. An Early Classification Algorithm for Small Sample Transient Source Based on Machine Learning[J]. Acta Optica Sinica, 2024, 44(24): 2430005
Category: Spectroscopy
Received: May. 21, 2024
Accepted: Jun. 24, 2024
Published Online: Dec. 19, 2024
The Author Email: Liu Chengzhi (lcz@cho.ac.cn), Wu Chao (cwu@nao.cas.cn)