Spectroscopy and Spectral Analysis, Volume. 42, Issue 8, 2353(2022)
A Comparative Study of the COD Hyperspectral Inversion Models in Water Based on the Maching Learning
Chemical oxygen demand (COD) is an important indicator of organic pollution in water. How to quickly and accurately test the COD content of water is particularly important. The application of machine learning in the field of water quality inversion is increasing, and more research results have been obtained. Hyperspectral remote sensing has the advantages of high spectral-spatial resolution and multiple imaging channels, so it has great potential in retrieving water’s COD. This study uses different hyperspectral pre-processing methods to process the original hyperspectral data. It uses the hyperspectral data before and after processing to compare the inversion performance of different machine learning models and different hyperspectral pre-processing methods on the COD content of water. Firstly, 1 548 groups of COD content and corresponding hyperspectral data (400~1 000 nm) samples were collected by ZK-UVIR-I in-situ spectral water quality on-line monitor in Baodai River. In order to reduce the interference of spectral noise and eliminate the influence of spectral scattering, Savitzky-Golay (SG) smoothing, Multiplicative scatter correction (MSC) and SG smoothing combined with MSC methods were used to pre-process the original spectra. Secondly, the sample set is randomly divided into training set and test set, where the training set accounts for 80% and the test set accounts for 20%. A COD hyperspectral inversion model based on the four machine learning methods of linear regression, random forest (random forest), AdaBoost, and XGBoost was established for the pre-processed training set full-band spectrum. Moreover, three indexes of determination coefficient (R2), root mean square error (RMSE) and relative analysis error (RPD) were selected to evaluate the accuracy of the hyperspectral inversion model. The results show that random forest, AdaBoost and XGboost are all the better than linear regression. The prediction ability of the inversion model established by XGboost is the best whether the spectral data is processed or not, with R2 of 0.92, RMSE of 7.1 mg·L-1, and RPD of 3.4. Considering that the original spectrum may be redundant, the dimensionality reduction of the spectrum after SG smoothing and MSC processing is performed by principal component analysis (PCA), and the top ten principal components with a cumulative contribution rate of 95% are selected as the input variables of the model. XGBoost established the inversion model, and the results show that after PCA, the accuracy of the inversion model is improved, the RPD is 3.8, and the training time of the model is shortened from 72 seconds to 2.9 seconds. The above research can provide new methods and ideas for establishing hyperspectral inversion models of this water area and similar water areas.
Get Citation
Copy Citation Text
Chun-ling WANG, Kai-yuan SHI, Xing MING, Mao-qin CONG, Xin-yue LIU, Wen-ji GUO. A Comparative Study of the COD Hyperspectral Inversion Models in Water Based on the Maching Learning[J]. Spectroscopy and Spectral Analysis, 2022, 42(8): 2353
Category: Orginal Article
Received: Jun. 15, 2021
Accepted: --
Published Online: Mar. 17, 2025
The Author Email: WANG Chun-ling (wangchl@bjfu.edu.cn)