Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition

Min Sun; Nian Fang

doi:10.3788/AOS241101

Acta Optica Sinica, Volume. 44, Issue 21, 2106007(2024)

Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition

Min Sun and Nian Fang^*

School of Communication and Information Engineering, Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication, Shanghai University, Shanghai 200444, China

show less

Abstract Get PDF(in Chinese)

Objective

The distributed optical fiber sensing system based on a phase-sensitive optical time-domain reflectometer (φ-OTDR) has been widely used for disturbance signal recognition in perimeter security, pipeline monitoring, railway transportation monitoring, and other fields, due to its advantages of high sensitivity, multi-point monitoring, and wide coverage. Currently, machine learning-based methods are the primary approach to enhance the accuracy of disturbance signal recognition. Classical machine learning algorithms require preprocessing of raw input signals through manual feature extraction. Typically, increasing the number of extracted features is aimed at achieving higher recognition accuracy with the growth in the number of disturbance events. However, introducing irrelevant features can adversely affect recognition accuracy and efficiency. Therefore, the feature selection process, which eliminates irrelevant features to strengthen recognition performance, plays a crucial role in the preprocessing stage. Feature selection methods can be categorized into three types: filter, wrapper, and embedded methods. Particularly, most feature selection methods used for optical fiber disturbance signal recognition fall under the filter method category, often overlooking the relationship between features and models. In this study, we aim to develop a more efficient and interpretable feature selection method for identifying key features to further boost recognition performance.

Methods

We propose a novel feature selection method based on Shapley additive explanations (SHAP), which is an explainable artificial intelligence (XAI) method. SHAP is inspired by game theory to calculate the Shapley value, which can quantify the contribution of each feature to the model’s prediction (Equation 1). We use SHAP to obtain the mean SHAP value for a classification model. The higher the mean, the more important the feature. We rank the features by importance and select some of the most significant ones to form a feature subset while ensuring high recognition rates. This subset is used to retrain the model, thereby improving recognition efficiency.

Results and Discussions

Experimental validation is conducted using an open dataset of optical-fiber disturbance events from Beijing Jiaotong University, divided into training and test sets at an 8∶2 ratio (Table 1). The dataset includes six typical disturbance events: background noise, digging, knocking, watering, shaking, and walking. We extract sixteen time-domain features from the disturbance signals after differentiation and segmentation. Additionally, wavelet packet decomposition (WPD) is employed to extract six frequency-domain features (Tables 2 and 3). The feature set, comprising twenty-two features, is normalized and inputted into four common machine learning models as baselines: support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF). KernelSHAP is applied to SVM and KNN, while tree SHAP is used for DT and RF. The ranking of these twenty-two features is determined across the four models (Fig. 6). Importantly, each feature contributes differently to the classification of the six disturbance events depending on the model. To maintain recognition accuracy without compromise, we retain a varying number of key features for each model. Comparing the accuracy, precision, recall, and F1-score from the test confusion matrices (Tables 4?5), we observe improvements in recognition performance across varying degrees due to feature selection. Among the four models, the RF model achieves the highest recognition accuracy of 96.5%. Furthermore, the average recognition time per sample for the RF model decreases from 81.82 ms without feature selection to 66.01 ms, which marks a 19.3% reduction (Table 6). Common feature selection methods such as fisher score and mutual information are also used for comparison with the SHAP-based feature selection method. The SHAP-based method demonstrates superior recognition accuracy compared to these alternatives (Table 7).

Conclusions

We propose a feature selection method characterized by interpretability and reliability. This method leverages explainable AI (XAI) techniques to quantify the importance of different features for the model and selects them based on their importance rankings. By retaining the most effective features for model classification and discarding redundant or detrimental ones, our approach enhances recognition accuracy while reducing computational costs and identification time. Twenty-two features are extracted from six types of disturbance events using an open dataset from Beijing Jiaotong University. We employ four common machine learning models for signal recognition. By carefully considering variations in feature importance rankings across models, we construct different subsets of features. This results in significant decreases in single-sample testing times for all four models and varying degrees of improvement in average recognition accuracy. Compared with filtering methods based on statistical metrics, our proposed method selects more valuable features, thereby achieving higher recognition rates. It is important to note that these conclusions are drawn solely from the dataset used. Further validation is necessary to assess its applicability to more complex or real-world datasets. Future work could involve comparing feature importance rankings across more models and integrating other feature selection methods to develop a versatile approach for optical-fiber disturbance signal recognition.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

explainable machine learning feature selection phase-sensitive optical time-domain refractometer sensor signal recognition

Tools

Get Citation

Copy Citation Text

Min Sun, Nian Fang. Interpretable Feature Selection Method for Optical-Fiber Disturbance Signal Recognition[J]. Acta Optica Sinica, 2024, 44(21): 2106007

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Fiber Optics and Optical Communications

Received: May. 30, 2024

Accepted: Jul. 15, 2024

Published Online: Nov. 20, 2024

The Author Email: Fang Nian (nfang@shu.edu.cn)

DOI:10.3788/AOS241101

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology