Ruikai Xue; Yan Kang; Tongyi Zhang; Fanxing Meng; Xiaofang Wang; Weiwei Li; Lifei Li; Wei Zhao

doi:10.3788/LOP223192

1　Introduction

Light detection and ranging（Lidar）technology is widely utilized in disciplines such as robot vision^［1］，self-driving cars^［2］，target tracking^［3-4］，and remote sensing^［5-6］ to acquire three-dimensional（3D）profiles of a scene. The single-photon Lidar system，which enables the detection of optical signals as weak as a single-photon level，is developing rapidly over the last two decades^［7-13］. The time of flight（TOF）or arrival time of each detected photon can be recorded by the single-photon Lidar system using a combination of time-correlated single photon counting（TCSPC）and a single photon detector，such as single-photon avalanche diode（SPAD）^［14］. This Lidar system is particularly suitable for the detection of optical signals in low-light environments，such as distant targets with low reflectivity^［15-16］ or power-constrained system^［17-18］. However，signal photons are easily drowned out by background noise，making it difficult to detect the time information provided by the signal photons and compute the distance to the target accurately.

Initially，the researchers employed a cumulative technique for noise suppression，accumulating hundreds to thousands of photons to create an accurate photon counting histogram，which requires a lengthy measuring time^［19］. The background noise due to its random characteristics has a uniform distribution during the detection period. When compared to background noise，the signal photons reflected off the target have an aggregation effect and are more likely to collect into a peak in the histogram. The depth of the target can be estimated using histogram information paired with cross-correlation or maximum likelihood（ML）estimation^［20-21］. When the accumulation time is not long enough to accumulate enough photons，this method becomes ineffective. Emerging photon-efficient imaging systems have been demonstrated to obtain 3D images of targets using a few photons^［21-27］.

First-photon imaging was proposed that can acquire a depth image when only the first arrival photon is detected at each pixel^［21］. The computational imaging framework produced by this technology is based on the physical model of single-photon detection and the spatial correlation of natural scenes. Later，based on the first-photon imaging，Shin et al.^［22］ created an array-type photon counting 3D imaging system in conjunction with a related computational imaging method，and accomplished 3D imaging with an average of ~ 1 photon per pixel. Halimi et al.^［26］ introduced a hierarchical Bayesian model-based algorithm for the restoration of depth and reflectivity obtained in the limit of very low photon counts and imaging in an underwater environment. However，the image accuracy obtained by the preceding approaches degrades when the ambient noise is severe，such as in situations with a very low signal-to-background ratio（SBR < 0.1）. As a result，a photon-efficient technique emphasizing pixel-wise unmixing of signal and noise photons was presented，the key idea of which is to adaptively remove noise detection using adaptive clustering and a super-pixel filling mechanism^［28］. Finally，the accurate depth imaging was realized at SBR as low as 0.04. However，because of the pixel-by-pixel noise filter processing，their solution requires a significant amount of time.

In this paper，we present an optimal combination method for fast 3D image reconstruction with few photons and low SBR. The first step is to select global gating using cumulative histogram information of signal and noise photons from all pixels in the SPAD array. The reflected signal photons are gathered and display a significant peak in the relevant time bin range when compared to the noise range. We use a threshold method to determine the interval of short-duration range gates and apply noise filtering to each pixel. In the second step，nearby pixel accumulation increases the number of photons within a single pixel，eliminating mistakes caused by signal photons loss within the pixel. And the depth of the target is estimated by ML algorithms. Finally，we optimize the depth calculated images using median filtering and total variance（TV）regularization. With a simulated extreme environment（SBR is 0.01 and an average photon per pixel（PPP）is approximately 2）and a laboratory setting，our technique enables quick 3D reconstruction（SBR is 0.05 and PPP is 2.35）.

2　Experiments and Methods

2.1　Experimental system setup

The experimental system designed and built based on SPAD array detector（PF32，Photon Force，UK）is shown in Fig. 1. A diffractive spot matrix from a diffractive optical element（DOE；MS-694-Q-Y-A，Holo/Or）matched to each pixel in the 32×32 size SPAD array is used in the system. The spot matrix only illuminates the field-of-view area corresponding to the photosensitive areas of the individual SPAD array pixels，increasing the system’s efficiency in utilizing reflected laser energy. The experimental scheme and the actual optical system are shown inFig. 1（a）andFig.1（b），respectively. The system consists of the following components：a lens system（L2）to compress the DOE outgoing spot matrix divergence angle to match the detector field of view，a half waveplate（HWP）to control the polarization of the system outgoing laser，a polarization beam splitter（PBS）combined with a λ/4 waveplate（λ/4 WP）as a transmit-receive switch，and an objective lens（OBJ）to focus photons onto the SPAD array. We employ a pulsed laser（LDH-D-TA-530，PicoQuant，Germany）with a pulse frequency ƒ of 20 MHz，a duration $T_{r}$ of 50 ns，and a total final output power of 102 μW. The SPAD array runs with a time resolution of 55 ps and all pixels operate in parallel mode with a frame period of 10 μs. In the experiment，the target distance is about 3 m. The specific data acquisition process and sub-pixel scanning method have been described in detail in Ref.［29］. Based on the panoramic sub-pixel scanning，we performed an additional 5-fold micro-scanning in the vertical scanning based on the panoramic sub-pixel scanning^［29］. The final resolution of the image is 160×1039，where each small region of the scene corresponds to one pixel（i，j）of the depth image $z \in R_{+}^{N_{i} \times N_{j}}$ and the intensity image $α \in R_{+}^{N_{i} \times N_{j}}$ .

Figure 1.Experimental system. (a) Experimental system model; (b) diagram of experimental system optical path

Download full size

View all figures

2.2　3D image reconstruction methods

The photon detection model in a single-photon Lidar system with pulses $s (t)$ illuminating a single pixel（i，j）^［22］ can be described as

r_{i, j} (t) = α_{i, j} s (t - 2 z_{i, j} / c) + b_{i, j}

，（1）

where $α_{i, j}$ denotes the reflectance corresponding to a pixel（i，j）and $b_{i, j}$ shows the flux of ambient light generated at different pixels of the SPAD array. Because the SPAD array detector is non-uniform，the quantum efficiency of each pixel can be denoted as $η_{i, j} \in [0,1)$ . The detector also has a dark count，expressed as a rate $d_{i, j}$ ，which ultimately gives a total detection intensity of

λ_{i, j} (t) = η_{i, j} r_{i, j} (t) + d_{i, j} = η_{i, j} α_{i, j} s (t - 2 z_{i, j} / c) + (η_{i, j} b_{i, j} + d_{i, j})

，（2）

it represents the result of combining noise terms while ignoring the effect of dead time.

The detection rate of photons in each laser illumination cycle ^［28］can be denoted as

Λ (α_{i, j}) = \int_{0}^{T_{r}} λ_{i, j} (t) = η_{i, j} α_{i, j} S + B_{i, j},

（3）

where $S = \int_{0}^{T_{r}} s (t) d t$ and $B_{i, j} = (η_{i, j} b_{i, j} + d_{i, j}) T_{r}$ denote the total signal and the background count containing the dark count within each repetition period，respectively. While the system is running in low-light situations， $η_{i, j} α_{i, j} S + B_{i, j} ≪ 1$ ，the likelihood of detecting a photon within a single cycle is low. In the experiments，we used a generated dataset with the size of $M \times N$ ，a single pixel illuminated by $R$ pulses. The total number of photons is $k_{i, j}$ ，and the arrival time of each photon is $T_{i, j} = \{t_{i, j}^{(1)}, t_{i, j}^{(2)}, \dots, t_{i, j}^{(k_{i, j})}\}$ at each pixel.

In the presence of low target reflectivity，long detection distance，large background noise，or poor illumination laser intensity，a few photons of a single-photon Lidar system are very sensitive to interference by background noise，i.e.， $η_{i, j} α_{i, j} S ≪ B_{i, j}$ . The histogram method based on photon counting can easily be drowned out by background noise without increasing the data acquisition time. The depth of information of the target has become difficult to assess precisely. Furthermore，the low-light environment suffers from missing signal photons within some pixels. Especially the face of signal photons and the low-light environment further reduce the SBR and increase the difficulty of the signal photons being separated. Therefore，in the face of a lower signal-to-background ratio environment，we need to reduce the $B_{i, j}$ while increasing the proportion of $η_{i, j} α_{i, j} S$ .

Fig. 2 depicts our data processing flow chart. To increase the SBR of the obtained data，the first step is to separate signal photons from noise via global gating. According to the results ofEq.（3），one strategy to improve the SBR is to reduce the noise from the ambient light flux by lowering the $B_{i, j}$ . Reducing the count $B_{i, j}$ requires first locating the interval range corresponding to the target $2 z_{i, j} / c$ . It is easy to determine the target location $2 z_{i, j} / c$ by the peak of the photon counting histogram when the number of photons is high. Under low SBR condition，the number of signal photons is substantially smaller than the background noise，making it difficult to precisely locate the target $2 z_{i, j} / c$ by the data within a single pixel，as shown in Fig. 3（a）. $B_{i, j}$ has a uniform distribution across the entire period $T_{r}$ interval，while $η_{i, j} α_{i, j} S$ is a clustered distribution around $2 z_{i, j} / c$ . By accumulating，we establish the time bin corresponding to the short-duration range gates $η_{i, j} α_{i, j} S$ and $B_{i, j}$ for all pixels within the acquired $M \times N$ size，i.e.，all pixels’ accumulation. The two signal photon accumulation peaks that eventually appear，as illustrated in Fig. 3（b），are substantially more apparent than the background noise. We use $\bar{E}$ as the mean value of the background noise count we choose $B_{i, j}$ . To obtain adequate short-duration range gates and to eliminate background noise $B_{i, j}$ ，we keep a 10% fluctuation value，i.e.，the maximum value $\bar{E} (1 + 10 %)$ is set as the threshold $\bar{H}$ . The threshold $\bar{H}$ selecting the interval of the short-duration range gates $T_{d}$ where $h$ （photon number）is greater than $\bar{H}$ discriminates the histogram of collected photon counts for the complete area pixels. The method reduces the range of the time bin for a fast reduction of the background count by using the information within the whole pixel to determine the short-duration range gate interval $T_{d}$ .

Figure 2.Data processing flow chart of the proposed method

Download full size

View all figures

Figure 3.Histogram of photon counting statistics. (a) Typical histogram of photon counting statistics for a specific single pixel; (b) photon counting statistics histogram after accumulation of all pixels within $M \times N$

Download full size

View all figures

The second stage of our proposed solution addresses the few photons issue，which creates a larger depth estimate error that affects the correctness of the image’s 3D reconstruction. We leverage the spatial connection between surrounding pixels to boost photon accumulation by accumulating signal photon numbers inside neighboring pixels. We expand the pixel array with $M \times N$ size around to a pixel size with $(M + 2) \times$ $(N + 2)$ . As a tuple，a tiny 3×3 matrix is employed，with the pixel（i，j）as the center，where all photons are aggregated and deposited（i，j）. Then，we can obtain a photon accumulation cell. When the neighboring pixels accumulate，we can have a greater number of photon signals and gain more accurate depth estimation in each pixel.

Based on the $M \times N$ compensation matrix，we do a 3D image estimation of the target. The arrival time’s negative log-maximum likelihood（ML）function looks like this：

L (z_{i, j} | t_{i, j}^{U}) = - \sum_{i, j}^{|U_{i, j}|} l o g [s (t_{i, j}^{U} - 2 z_{i, j} / c)]

，（4）

where $U_{i, j}$ denotes the signal set which is determined by $T_{d}$ ，we assume that all detections $U_{i, j}$ are caused by the signal. The above-mentioned regularized ML estimation and median filtering model can be translated into the following convex optimal solution problem^［21］：

\hat{z} = \underset{z_{i, j}}{a r g m i n} \sum_{i = 1}^{M^{'}} \sum_{j = 1}^{N^{'}} L (z_{i, j} | {t_{i, j}^{U}}_{l}) + β p e n (z_{i, j})

，（5）

$p e n (z_{i, j})$ denotes the TV regularization term to be used for this constraint solution. $β$ is a weighting factor to adjust the degree of influence on the penalty term. We use the SPIRAL-TAP solver to tackle this convex optimization problem，which is an efficient reconstruction approach based on the Poisson distribution^［30］.

Finally，we validate the proposed method by using simulated data of a Bowling scene^［31］ and our experimental data in low SBR environments. Furthermore，we compare depth estimation of the proposed method with that of the method in Ref.［24］and the method in Ref.［28］.

3　Experimental Result

3.1　Simulation result

To verify our method，we use input images from the Middle-bury dataset^［31］ to form the synthetic data to simulate a low SBR environment. We adjusted the system parameters for the simulated data concerning the actual experimental system because each pixel in the Middle-bury dataset utilized has only one realistic depth. In this case，we choose a 626×555 pixels area Bowling scene as the target. Photon counts are calculated using a Poisson random variable with an average PPP of 2^［28］. To meet the low SBR requirement，the background counts are likewise generated using Poisson random variables. We set the SBRs as 0.1，0.08，0.06，0.04，0.02，and 0.01 to simulated data sets.

We compare the suggested method with the method in Ref.［24］and the method in Ref.［28］. They are state-of-the-art photon-efficient methods for 3D image reconstruction. The root mean squared error（RMSE）for each reconstructed depth image is generated to assess the quality of the various approaches for estimating depth images：

E_{R M S E} (z, \hat{z}) = \sqrt[]{\frac{1}{N_{i} N_{j}} \sum_{i = 1}^{N_{i}} \sum_{j = 1}^{N_{j}} (z_{i, j}^{r e f} - {\hat{z}}_{i, j})^{2}},

（6）

where ${\hat{z}}_{i, j}$ is the ground-truth depth image.

The results of our depth estimation of the Bowling scene using simulated data are shown in Fig. 4. Although the method in Ref.［24］can achieve 3D image reconstruction under low photon count conditions，obtaining a reliable target depth estimation at lower SBRs is difficult. For example，as illustrated inFig. 4（a）-（d），the method in Ref.［24］can collect 3D images border information but does not estimate precise depth information. The depth estimate approach in Ref.［28］is shown inFig. 4（e）-（h）for the simulated scene. In the depth estimate results，we can acquire the depth information of each target. Compared with the result of the method in Ref.［24］，the method in Ref.［28］is a significant improvement in image quality. When SBR is 0.01，the increased background noise count leads to the method in Ref.［28］to have huge mistakes in depth estimation and the plate’s edge to disappear. The depth estimation of the Bowling scene by our proposed method is shown inFig. 4（i）-（l）. The method provides a clear estimation of the Bowling ball boundary information，although there are some errors within pixels. For instance，even when SBR is 0.01，the suggested technique can get reliable depth estimations of the Bowling picture boundaries without introducing substantial error regions.

Figure 4.3D image reconstruction of the Bowling ball simulation scene at the average PPP of 2, SBR of 0.08, 0.04, 0.02, and 0.01 respectively. (a)-(l) 3D reconstruction images of the method in Ref.[24], the method in Ref.[28], and our proposed method under different SBR; (m) (n) Bowling ball picture and the real depth value

Download full size

View all figures

The RMSE and computational time（T）are calculated for each method，and the experimental results are shown in Fig. 5 and Table 1. The RMSE value of our proposed approach and the unmixing algorithm is similar and smaller than the RMSE value of the method in Ref.［24］，as shown inFig. 5（a）. The RMSE value of the reconstructed image obtained by the method in Ref.［28］decreases quickly at the SBR of 0.01. In contrast，the RMSE in the image acquired by this approach grows as the signal-to-background ratio is dropped more，i.e.，SBR is 0.01. Even so，the RMSE of the reconstructed image of the proposed method is 0.036 m，which is smaller than that of the method in Ref.［24］and the method in Ref.［28］. Most importantly，our proposed solution has the shortest overall average processing time of the three methods（64.95 s）. This is because，unlike the unmixing approach，which requires painstakingly calculating a short-duration range gate per pixel，our suggested method achieves noise filtering in the first step by obtaining a common short-duration range gate. The RMSE value of our proposed method and the unmixing algorithm is approximately equal as the SBR increases，and the difference in T of the algorithm decreases，as shown inFig. 5（b）.

Figure 5.Performance evaluation of depth estimation and computation time on the Bowling simulation dataset under different algorithms, the average PPP is 2. (a) RMSE of depth estimation; (b) computation time comparison

Download full size

View all figures

Table 1. Algorithm performance comparison

View table

View all Tables

Table 1. Algorithm performance comparison

SBR	Method in Ref.［24］		Method in Ref.［28］		Proposed method
SBR	RMSE /m	T /s	RMSE /m	T /s	RMSE /m	T /s
0.1	0.351	56.58	0.040	421.40	0.030	57.34
0.08	0.309	65.23	0.063	456.30	0.031	56.66
0.06	0.266	81.28	0.063	592.73	0.030	56.29
0.04	0.226	114.51	0.060	836.84	0.031	56.77
0.02	0.218	211.57	0.084	2757.4	0.033	59.27
0.01	0.249	414.68	0.272	11523	0.036	64.95

3.2　Experiment result

A building model scene acquired with our single-photon Lidar system is used for the evaluation using real experimental data. The picture of the target and the depth imaging results are shown in Fig. 6. In a laboratory setting，we collected experimental data for the building model using a sub-pixel panoramic scanning approach with an image size of 160×1036 and an SBR of 0.05. A few photons were collected by randomly extracting the same number of frames within a single detector pixel with an average PPP of 2.35.Fig. 6（b）andFig. 6（c）show the 3D image reconstruction of the street scene model by the method in Ref.［24］and method in Ref.［28］. The approach in Ref.［24］recognized building targets but produced high errors in picture depth estimation，with an RMSE is 0.200 m. The approach in Ref.［28］is more accurate for building model identification，with an RMSE is 0.045 m，as illustrated inFig. 6（c）.Fig. 6（d）shows the result of our proposed method for the 3D reconstruction of the building model with an RMSE of 0.019 m and T of 19.77 s. The chance of an empty pixel in the recorded data set is minimal due to the high reflectivity of the building target surface. Furthermore，the neighborhood pixel accumulation in our method uses the same size cells as the method in Ref.［28］，and the number of photons within a single pixel after accumulation is greater. In terms of imaging accuracy and algorithm running time，our suggested method outperforms the method in Ref.［28］and greatly outperforms the method in Ref.［24］. WhereFig. 6（e）shows the ground truth with accurate depth estimation obtained for a larger number of photons. The computer used in the experiment has an i7-6700 CPU with the main frequency of 3.4 GHz. By combining simulated Bowling ball data and laboratory-acquired building model data，our suggested method outperforms previous methods in terms of lower SBR，and a faster processing speed.

Figure 6.3D image reconstruction of the building model at SBR of 0.05 and PPP of 2.35. (a) Target picture; (b) depth estimation result obtained by the method in Ref.[24]; (c) depth estimation image obtained by the method in Ref.[28]; (d) depth estimation image obtained by our proposed method; (e) ground truth

Download full size

View all figures

4　Conclusions

In summary，we propose a fast 3D reconstruction method for single photon Lidar with low SBR and few photons that preprocess the raw data by combining full pixel accumulation to generate a short-duration range gate and neighboring pixel accumulation to obtain effective depth estimation and then uses regularized ML estimation to obtain depth images. Our method increases the processing speed while ensuring the accuracy of the reconstructed depth image at low SBR. High-quality depth images are obtained by the proposed method in both the simulated（SBR is 0.01，PPP is 2）and real laboratory conditions（SBR is 0.05，PPP is 2.35）. Our proposed strategy improves the imaging speed of 3D images in situations with low SBR and few photons.

Category: Imaging Systems

Received: Nov. 29, 2022

Accepted: Dec. 14, 2022

Published Online: Apr. 13, 2023

The Author Email: Kang Yan (kangyan@opt.ac.cn), Zhang Tongyi (tyzhang@opt.ac.cn)

DOI:10.3788/LOP223192