Spectroscopy and Spectral Analysis, Volume. 45, Issue 3, 685(2025)
Diagnosis of Lung Cancer by Human Serum Raman Spectroscopy Combined With Six Machine Learning Algorithms
Lung cancer is a serious threat to human health. In recent years, the incidence of lung cancer has been increasing in China. Imaging examination and histopathological examination are the main screening methods for lung cancer. Imaging examinations are widely used as a preliminary screening method, but they have some uncertainties. The result of the histopathological examination is accurate, so the histopathological examination is the “gold standard” of a lung cancer diagnosis. However, the acquisition of tissue samples can cause traumatic lung injury. Therefore, developing a reliable and minimally invasive method for lung cancer diagnosis is necessary. Acquiring serum samples is more convenient and less invasive than pathological tissue samples. Raman spectroscopy has the advantages of a simple operation, rapid sensitivity, and the ability to provide biochemical information on serum samples. This study obtained Raman spectra of the serum in 155 healthy subjects and 92 lung cancer patients. Curve fitting was applied to the Raman spectra data, and characteristic differences between healthy subjects and lung cancer patients were found in the range of 1 800~800 cm-1. The curve fitting results showed that compared with healthy subjects, the area percentages of sub-peaks around 1 005 and 1 091 cm-1 of lung cancer patients increased by 3.36% and 5.32%. On the contrary, the area percentage of sub-peaks around 964, 1 522 and 1 586 cm-1 of lung cancer patients decreased by 2.3%, 2.82%, and 5.6%. The preliminary results of curve fitting showed that the biochemical substances of carotenoids, amino acids, ribose, and nucleic acids in the serum of lung cancer patients were altered. To investigate the Raman spectral characteristics of serum in healthy subjects and lung cancer patients, machine learning methods were used to obtain the hidden information of the Raman spectral data. First, principal component analysis (PCA) was used to extract the characteristic variables of the spectra. The characteristic variables were applied to support vector machine (SVM), random forest (RF), k-nearest neighbors (kNN), logistic regression classification (LRC), Decision Tree (DT), and Bayesian algorithm, respectively, to build classification models. The models predictive performance was evaluated by the leave-one cross-validation method. The results showed that the SVM model best classifies serum Raman spectra. The accuracy, sensitivity, specificity, and F1 are 98%, 94.44%, 100% and 97.14%, respectively. The average of values of the 9-fold cross-verification ROC area under the curve for the SVM model was 0.94, which indicated that the SVM model had a good predictive performance. The result showed that serum Raman spectroscopy combined with machine learning methods can effectively diagnose lung cancer. This technique is minimally invasive and highly accurate; it is a potential diagnostic technology for lung cancer.
Get Citation
Copy Citation Text
NI Qin-ru, OU Quan-hong, SHI You-ming, LIU Chao, ZUO Ye-hao, ZHI Zhao-xing, REN Xian-pei, LIU Gang. Diagnosis of Lung Cancer by Human Serum Raman Spectroscopy Combined With Six Machine Learning Algorithms[J]. Spectroscopy and Spectral Analysis, 2025, 45(3): 685
Received: Jan. 11, 2024
Accepted: Mar. 24, 2025
Published Online: Mar. 24, 2025
The Author Email: Quan-hong OU (ouquanhong@163.com)