UPSU‑Net: an Unsupervised Deep Learning Framework for Photoacoustic Spectral Unmixing

Jingsai Ai; Zheng Sun; Yingsa Hou; Meichen Sun

doi:10.3788/AOS241824

Acta Optica Sinica, Volume. 45, Issue 11, 1117001(2025)

UPSU‑Net: an Unsupervised Deep Learning Framework for Photoacoustic Spectral Unmixing

Jingsai Ai¹, Zheng Sun^1,2、*, Yingsa Hou¹, and Meichen Sun¹

¹Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, Hebei , China

²Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, Hebei , China

show less

Abstract Get PDF(in Chinese)

Objective

Photoacoustic spectral (PAS) imaging is a hybrid imaging modality that combines the photoacoustic effect with ultrasound detection to visualize biological tissues. This technique provides valuable information of the functional and molecular characteristics of tissues. By irradiating the target with multi-wavelength pulsed lasers, transient photon fields are generated within the tissue, leading to broadband ultrasound detected by an array of transducers. These signals undergo spectral unmixing, which enables the reconstruction of images that visualize and quantify various tissue components. However, a significant challenge in deep-tissue PAS imaging arises from the nonlinear dependence of the received signals on local molecular distributions, complicating accurate spectral decomposition. Traditional spectral unmixing methods, such as vertex component analysis (VCA) and independent component analysis (ICA), suffer from limitations due to their sensitivity to noise and the assumption of linear relationships.

Methods

We introduce a novel unsupervised deep learning framework, unsupervised photoacoustic spectral unmixing network (UPSU-Net), which accurately separates mixed spectra into individual component spectra without a priori data of absorption spectra, thus enhancing the precision and reliability of photoacoustic spectra. The proposed UPSU-Net utilizes a 3D convolutional autoencoder architecture designed to capture both spatial and spectral information from multi-wavelength PAS images. The encoder employs 3D convolutional layers to compress the input data while preserving spatial features, followed by an attention module that highlights critical features and reduces noise interference. The attention module is composed of global pooling and 3D convolution. Compared to traditional 2D convolution, 3D convolution is not only capable of processing 2D image data that contains spatial information (width and height) but also simultaneously capture information in an additional wavelength or temporal dimension. Therefore, it is more suitable for handling multi-dimensional data such as PAS data. The decoder consists of three fully connected layers, receiving the low-dimensional abundance features output by the encoder. After processing and nonlinear transformation through three fully connected layers, it learns the complex relationships between endmembers and ultimately reconstructs a high-resolution PAS image sequence, facilitating precise estimation of endmember spectra and their corresponding abundance.

The network model is trained using a simulated dataset in an end-to-end manner and is tested using simulated, phantom, and in vivo datasets. The training procedure includes two phases: encoding and decoding. During the encoding phase, the input samples are encoded to obtain the expected output. In the decoding phase, the output of the encoding layer is decoded to produce the reconstructed result of the input samples. By adjusting the network parameters, the reconstruction error is minimized to achieve the optimal abstract representation of the input features. The loss function incorporates terms for data fidelity, regularization, and smoothness constraints, ensuring optimal feature extraction and preventing overfitting. Adam optimization with a learning rate of 0.001, a maximum epoch of 225, and a batch size of 64 is employed for training.

Results and Discussions

Experimental results demonstrate that UPSU-Net significantly outperforms traditional methods such as VCA and ICA, as well as other deep learning models like U-Net and cascaded autoencoders (CAE). Specifically, UPSU-Net achieves reductions in root mean square error (RMSE) of approximately 15.00% and 27.66% in estimating abundance compared to VCA and ICA, respectively. The structural similarity index (SSIM) increases by 15.39% and 17.86% compared to VCA and ICA. When compared to U-Net and CAE, UPSU-Net shows RMSE improvements of 34.62% and 24.44%, and SSIM improvements of 13.79% and 3.77%, respectively. The robustness of the network model is further validated under various conditions, including different noise levels and varying numbers of wavelengths. Ablation studies confirm that both 3D convolutions and the attention mechanism significantly enhance the accuracy of spectral unmixing.

For simulated data, UPSU-Net successfully distinguishes optical absorbers with different components, even under nonlinear mixing scenarios where VCA struggles with noise and complex mixtures. In phantom experiments, UPSU-Net reliably identifies the positions and abundances of inclusions, whereas VCA, ICA, U-Net, and CAE exhibit reduced contrast between inclusions and background and suffer from significant artifacts. In vivo experiments further demonstrate that UPSU-Net provides clearer and more accurate depictions of target abundances, particularly for low-absorption endmembers, in comparison to baseline methods.

Conclusions

UPSU-Net represents a significant advance in unsupervised spectral unmixing for multi-spectral photoacoustic imaging. By using 3D convolutions and attention mechanisms, UPSU-Net captures complex spatial-spectral relationships without relying on linear or independence assumptions, thereby improving sensitivity to low-absorption endmembers. Future work will explore the integration of PAS imaging models with advanced deep learning frameworks to implement spectral unmixing in a model-data co-driven manner. The deep learning model will not only learn the mapping relationship between PAS imaging data and endmember distributions and abundance from a large amount of data, but also make full use of the structured information provided by the physical model. This will result in better generalization capabilities, ensuring the accuracy of unmixing while reducing model complexity and dependence on training data. Furthermore, expanding the methodology to include four-dimensional (3D+spectral dimension) photoacoustic imaging could offer comprehensive structural and functional information, significantly advancing medical research and clinical applications. The robust performance of UPSU-Net across diverse datasets underscores its potential to make a substantial impact on the field of photoacoustic spectroscopy.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

3D convolutional neural network autoencoder deep learning photoacoustic spectral imaging spectral unmixing

Tools

Get Citation

Copy Citation Text

Jingsai Ai, Zheng Sun, Yingsa Hou, Meichen Sun. UPSU‑Net: an Unsupervised Deep Learning Framework for Photoacoustic Spectral Unmixing[J]. Acta Optica Sinica, 2025, 45(11): 1117001

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites