Acta Optica Sinica, Volume. 44, Issue 21, 2106007(2024)
Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition
The distributed optical fiber sensing system based on a phase-sensitive optical time-domain reflectometer (φ-OTDR) has been widely used for disturbance signal recognition in perimeter security, pipeline monitoring, railway transportation monitoring, and other fields, due to its advantages of high sensitivity, multi-point monitoring, and wide coverage. Currently, machine learning-based methods are the primary approach to enhance the accuracy of disturbance signal recognition. Classical machine learning algorithms require preprocessing of raw input signals through manual feature extraction. Typically, increasing the number of extracted features is aimed at achieving higher recognition accuracy with the growth in the number of disturbance events. However, introducing irrelevant features can adversely affect recognition accuracy and efficiency. Therefore, the feature selection process, which eliminates irrelevant features to strengthen recognition performance, plays a crucial role in the preprocessing stage. Feature selection methods can be categorized into three types: filter, wrapper, and embedded methods. Particularly, most feature selection methods used for optical fiber disturbance signal recognition fall under the filter method category, often overlooking the relationship between features and models. In this study, we aim to develop a more efficient and interpretable feature selection method for identifying key features to further boost recognition performance.
We propose a novel feature selection method based on Shapley additive explanations (SHAP), which is an explainable artificial intelligence (XAI) method. SHAP is inspired by game theory to calculate the Shapley value, which can quantify the contribution of each feature to the model’s prediction (Equation 1). We use SHAP to obtain the mean SHAP value for a classification model. The higher the mean, the more important the feature. We rank the features by importance and select some of the most significant ones to form a feature subset while ensuring high recognition rates. This subset is used to retrain the model, thereby improving recognition efficiency.
Experimental validation is conducted using an open dataset of optical-fiber disturbance events from Beijing Jiaotong University, divided into training and test sets at an 8∶2 ratio (Table 1). The dataset includes six typical disturbance events: background noise, digging, knocking, watering, shaking, and walking. We extract sixteen time-domain features from the disturbance signals after differentiation and segmentation. Additionally, wavelet packet decomposition (WPD) is employed to extract six frequency-domain features (Tables 2 and 3). The feature set, comprising twenty-two features, is normalized and inputted into four common machine learning models as baselines: support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF). KernelSHAP is applied to SVM and KNN, while tree SHAP is used for DT and RF. The ranking of these twenty-two features is determined across the four models (Fig. 6). Importantly, each feature contributes differently to the classification of the six disturbance events depending on the model. To maintain recognition accuracy without compromise, we retain a varying number of key features for each model. Comparing the accuracy, precision, recall, and F1-score from the test confusion matrices (Tables 4?5), we observe improvements in recognition performance across varying degrees due to feature selection. Among the four models, the RF model achieves the highest recognition accuracy of 96.5%. Furthermore, the average recognition time per sample for the RF model decreases from 81.82 ms without feature selection to 66.01 ms, which marks a 19.3% reduction (Table 6). Common feature selection methods such as fisher score and mutual information are also used for comparison with the SHAP-based feature selection method. The SHAP-based method demonstrates superior recognition accuracy compared to these alternatives (Table 7).
We propose a feature selection method characterized by interpretability and reliability. This method leverages explainable AI (XAI) techniques to quantify the importance of different features for the model and selects them based on their importance rankings. By retaining the most effective features for model classification and discarding redundant or detrimental ones, our approach enhances recognition accuracy while reducing computational costs and identification time. Twenty-two features are extracted from six types of disturbance events using an open dataset from Beijing Jiaotong University. We employ four common machine learning models for signal recognition. By carefully considering variations in feature importance rankings across models, we construct different subsets of features. This results in significant decreases in single-sample testing times for all four models and varying degrees of improvement in average recognition accuracy. Compared with filtering methods based on statistical metrics, our proposed method selects more valuable features, thereby achieving higher recognition rates. It is important to note that these conclusions are drawn solely from the dataset used. Further validation is necessary to assess its applicability to more complex or real-world datasets. Future work could involve comparing feature importance rankings across more models and integrating other feature selection methods to develop a versatile approach for optical-fiber disturbance signal recognition.
Get Citation
Copy Citation Text
Min Sun, Nian Fang. Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition[J]. Acta Optica Sinica, 2024, 44(21): 2106007
Category: Fiber Optics and Optical Communications
Received: May. 30, 2024
Accepted: Jul. 15, 2024
Published Online: Nov. 20, 2024
The Author Email: Fang Nian (nfang@shu.edu.cn)