Single-Photon Imaging Algorithms Based on Multi-Scale Fusion Networks

LiDAR has been widely applied in fields such as aerospace, autonomous driving, 3D modeling, and environmental monitoring. However, traditional LiDAR systems face significant challenges in detecting weak signals under conditions such as haze, sandstorms, underwater environments, and long-distance scenarios. To meet the demand for weak-signal detection capabilities, single-photon LiDAR based on single-photon avalanche diodes (SPADs) and time-correlated single-photon counting (TCSPC) technology can enhance detection sensitivity to the photon level. This significant improvement in light signal utilization forms the foundation for high-precision 3D reconstruction in weak-signal environments. However, in practical applications, single-photon LiDAR struggles to accumulate enough signal photons due to limitations in laser power and imaging time. In addition, ambient light and dark counts from SPAD detectors introduce significant noise into the measurement data. Photon-counting histograms with low signal photon counts and low signal-to-background ratios present significant challenges to single-photon imaging algorithms. Moreover, existing single-photon imaging algorithms often fail to fully account for the spatiotemporal and feature correlations in the data, neglecting the interrelationships among time, space, and channels. To address these issues, we propose an attention-guided multi-scale fusion neural network (AMSF-Net), based on the spatiotemporal and feature correlations of photon-counting histograms, designed for single-photon imaging from highly noisy measurement data.

Methods

AMSF-Net consists of three main components: the feature extraction module, the feature integration module, and the reconstruction module. In the feature extraction module, AMSF-Net alternates between standard 3D convolutions and dilated 3D convolutions along two parallel branches to extract different low-frequency features. Considering the temporal sparsity of photon-counting histogram data, a downsampling operation is introduced along the temporal dimension in the feature space to accelerate model training and expand the receptive field of the backbone network. In the feature integration module, we propose an improved multi-scale architecture. This structure adopts a strategy of low temporal resolution but high feature channel count, enhancing the model’s feature extraction capability. Due to the increased number of feature channels in the multi-scale structure, an attention-guided dilated dense fusion (ADDF) module is designed. By incorporating channel attention enhancement, this module eliminates potential redundancy introduced by the higher channel count. When combined with the multi-scale network, it effectively leverages temporal, spatial, and inter-channel correlations, significantly improving the network’s reconstruction performance. In the reconstruction module, transposed convolution is used to restore the spatial and temporal dimensions. Finally, the Softargmax layer estimates the time-of-flight of the laser pulses to generate the final depth map. In addition, we employ a hybrid loss function, combining Kullback-Leibler (KL) divergence and total variation (TV) regularization, to further enhance reconstruction quality.

Results and Discussions

Based on the single-photon LiDAR imaging model, photon-counting histograms are generated from the NYU v2 dataset for neural network training and the proposed algorithm is validated on simulated datasets. Qualitative experimental results demonstrate that the proposed model reconstructs depth maps with clear details under various signal-to-background ratios (SBRs), outperforming other algorithms. In quantitative comparisons, AMSF-Net achieves the best performance among all competing methods, attaining the lowest root mean square error (RMSE) and the highest structural similarity index (SSIM). When the SBR is 2∶10 or 2∶50 (where the error between the ground truth and predicted depth maps is less than 1.5%), AMSF-Net achieves an accuracy exceeding 92%. Even under an extremely low SBR of 0.02, AMSF-Net still maintains high-precision reconstruction with an RMSE below 0.05 m and accuracy above 90%. More importantly, AMSF-Net exhibits stronger robustness against noise interference. As noise levels increase, all algorithms experience performance degradation, but AMSF-Net shows the smallest decline across all metrics. Particularly under the extreme condition with SBR of 1∶100, AMSF-Net reduces the RMSE by 60% compared to the second-best method and improves accuracy by 11%, while remaining the only approach with an accuracy exceeding 80%. In addition, the proposed method demonstrates excellent imaging performance on real-world data, confirming its practical potential in single-photon imaging. Through ablation studies, the critical roles of the multi-scale architecture, ADDF module, and CBAM, as well as the effectiveness of the hybrid loss function (combining KL divergence and TV regularization), are verified in improving reconstruction accuracy.

Conclusions

To address the challenge of reconstructing high-quality depth images from low-signal, high-noise photon-counting histogram data, we propose AMSF-Net for single-photon depth reconstruction. Specifically, AMSF-Net effectively enhances network performance by reducing temporal dimensionality while appropriately increasing feature channel count. However, the increased feature channels in the multi-scale structure may introduce redundant information. To mitigate this, we integrate the CBAM attention mechanism into an expanded dense fusion module, constructing an ADDF module. This module enhances the network’s ability to extract critical features while significantly improving the accuracy of edge information in the spatiotemporal domain and feature information in the channel domain. Furthermore, a hybrid loss function combining KL divergence and TV regularization ensures reconstruction precision. Experiments with simulated datasets show that even under an extremely low SBR of 0.02, the proposed network achieves high-precision reconstruction with an RMSE below 0.05 m, confirming its robustness and effectiveness in high-noise environments. In addition, the method demonstrates excellent imaging performance on real-world data, validating its generalization capability.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism depth reconstruction multi-scale fusion network single-photon imaging

Tools

Get Citation

Copy Citation Text

Pengfei Zhou, Yuyang Zhao, Chenghao Jiang, Tianpeng Xie, Yan Jiang, Zhonghe Liu, Jingguo Zhu. Single-Photon Imaging Algorithms Based on Multi-Scale Fusion Networks[J]. Acta Optica Sinica, 2025, 45(10): 1011005

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites