High-resolution single-photon LiDAR without range ambiguity using hybrid-mode imaging [Invited]

Xin-Wei Kong; Wen-Long Ye; Wenwen Li; Zheng-Ping Li; Feihu Xu

doi:10.3788/COL202422.060005

1. Introduction

Single-photon light detection and ranging (LiDAR) presents high sensitivity and high temporal precision, which has been widely applied in fields such as topographic mapping^[1-3], remote sensing^[4], target identification^[5,6], and underwater imaging^[7]. To meet the application demands, long-range and high-resolution single-photon three-dimensional (3D) imaging has emerged as a significant trend in the development of single-photon LiDAR techniques^[8,9]. However, it remains challenging to directly achieve rapid and accurate 3D imaging over a wide field-of-view (FoV) and a large depth-of-view (DoV).

Array-based single-photon LiDAR can be used to achieve high-resolution 3D imaging^[10]. However, it needs a high-power laser to flood illuminate the scene. Besides, currently available detector arrays have limited size or show a poor time-tagging performance^[11]. Therefore, widely used single-photon LiDAR is typically based on raster scanning^[12,13]. But, high-density scanning inevitably leads to a longer imaging time. To mitigate this issue, data fusion techniques have been proposed to merge visible or infrared high-resolution images with single-photon LiDAR data to improve imaging resolution^[14-16].

Generally, single-photon LiDAR employs a time-correlated single-photon counting (TCSPC) technique. However, when the target is far away, the photon time of flight (ToF) that extends laser emission periods will be folded, resulting in range ambiguity^[17], which leads to difficulties in large-DoV imaging. Several approaches have been proposed to mitigate the range ambiguity. A pseudo-random pattern matching scheme^[18-21] can identify the exact flight time by correlation between the transmitted and received patterns. Meanwhile, the multi-repetition-rate scheme has also been demonstrated to increase the maximum unambiguous distance beyond 100 kilometers^[22] and achieve large-DoV imaging^[23]. Nonetheless, a comprehensive solution to achieve wide FoV and large DoV simultaneously is still lacking.

Here, we proposed and demonstrated a fusion method that simultaneously tackled the range-ambiguity and low-resolution bottleneck of single-photon LiDAR. We integrated a multi-repetition-rate single-photon LiDAR and a high-resolution intensity camera on hardware. On the software side, we developed a tailored fusion algorithm for recovering absolute distance and enhancing the image resolution in the scenario of low photon counts. We experimentally validated the ability to reconstruct high-resolution absolute depth images. We scaled up the image resolution by a factor of 10 by achieving $1300 \times 2611$ pixels and extended $\sim 4.7$ times the unambiguous range. Consequently, our method holistically achieved long-range, high-resolution 3D imaging of expansive scenes with high depth accuracy over a wide FoV and a large DoV.

2. Approach

In single-photon imaging, the system illuminates the target’s $p$ th pixel with a periodic laser pulse $s (t)$ and then measures the backscattered photons. By recording the time interval $t$ between the arrival of the echo signal and the most recent pulse emission, the depth $Z_{p}$ and reflectivity $α_{p}$ of the target’s $p$ th pixel can be estimated. However, when the target is far away, the photon ToF that extends laser emission periods $T$ will be folded, resulting in a Poisson-process rate function as follows: $λ_{p} (t) = η α_{p} \sum_{n_{p}} s (t + n_{p} T - 2 Z_{p} / c) + B, t \in [0, T),$ (1)where $η$ is detector’s photon-detection efficiency, $B$ represents the average rate of background-light plus dark-count detections, and $c$ is the speed of light. The parameter $n_{p} T$ represents the photon ToF being folded.

After $N$ pulsed-illumination trials, the likelihood function for the set of time interval ${t_{p}^{l}}_{l = 1}^{k_{p}}$ is $P ({t_{p}^{l}}_{l = 1}^{k_{p}}; Z_{p}, α_{p}) = e^{- Λ} \prod_{l = 1}^{k_{p}} N λ_{p} (t_{p}^{l}),$ (2)where $Λ = \int_{τ = 0}^{τ = T} N λ_{p} (τ) d τ$ , and $k_{p}$ is the total number of photons detected at the $p$ th pixel. Generally, the target distance can be estimated by applying maximum likelihood estimation (MLE): $Z_{p}^{MLE} = \underset{Z_{p}}{\arg \max} \sum_{l = 1}^{k_{p}} \log {N [η α_{p} \sum_{n_{p}} s (t_{p}^{l} + n_{p} T - 2 Z_{p} / c) + B]} .$ (3)

Because the maximum likelihood estimator is a periodic function of $Z_{p}$ , Eq. (3) has multiple optimal solutions, which prevents a straightforward calculation of the actual distance to the target and causes range aliasing.

To overcome this range ambiguity, we use a data acquisition scheme where adjacent pixels are detected through different laser pulse repetition periods and a data fusion method exploiting images captured by camera. The data acquisition scheme has been extensively detailed in a previous paper^[23]. Here, we focus on the use of high-resolution images for absolute distance reconstruction and upsampling of single-photon LiDAR data. The schematic of the algorithm is illustrated in Fig. 1, and the algorithm can be divided into two steps.

Figure 1.Schematic diagram of the algorithm. (a) Single-photon LiDAR data acquired by laser source with multiple repetition rates. (b) Image captured by camera. (c) Intensity image of (b). (d) Absolute distance image. (e) Horizontal, vertical, and diagonal gradient images from the camera image. (f) High-resolution depth image without range ambiguity.

Download full size

View all figures

2.1. Resolving range ambiguity guided by the intensity image

Upon acquiring the measurements via the multi-repetition-rate scheme, the integration of data from adjacent pixels within the neighborhood $Ω$ through cluster algorithms^[20] enables the determination of the absolute distance: ${\hat{Z}}_{p} = \underset{Z_{p}}{\arg \max} \sum_{q \in Ω} ω_{q, p} \sum_{l = 1}^{k_{q}} \log {N [η α_{p} \sum_{n_{p}} s (t_{p}^{l} + n_{p} \cdot T - 2 Z_{q} / c) + B]},$ (4)where the weighting factor $ω_{p, q}$ is used to avoid errors in distance calculation at the edges of objects. Similar to the previous paper^[23], we leverage the spatial and reflectivity information to evaluate the weighting factor $ω_{p, q}$ for neighboring pixels. However, due to the reflectivity map of single-photon LiDAR being susceptible to Poisson noise at low photon counts, we use conventional high-resolution camera images to evaluate the reflectivity information of single-photon LiDAR pixels. Due to the pixel number discrepancy between the conventional camera and single-photon LiDAR, the reflectivity value of the single-photon LiDAR is the weighted average of several conventional camera pixels. A many-to-one pixel mapping scenario arises: $I_{p} = \frac{4}{\sqrt{2 π D}} \sum_{l = 1}^{D} I_{p}^{l} e^{- \frac{8 (p - x_{p}^{l})^{2}}{D^{2}}},$ (5)where ${x_{p}^{l}}_{l = 1}^{D}$ and ${I_{p}^{l}}_{l = 1}^{D}$ correspond to the positions and intensities of the conventional camera images, respectively. Therefore, the definition of the weighting factor $ω_{p, q}$ is $ω_{p, q} = f (| p - q |) \cdot g (| I_{p} - I_{q} |)$ . Here, $f$ and $g$ are the spatial and reflectivity kernels, respectively, both positively correlated with the Gaussian distribution.

Since the above process of solving ${\hat{Z}}_{p}$ requires integration of the echo signals from the surrounding pixels, this often results in the image becoming overly smoothed, consequently reducing the imaging resolution and affecting the image quality. Here a convex optimization algorithm is employed to further enhance the accuracy of image reconstruction. The folded photon ToF $n_{p} T$ for the $p$ th pixel can be determined as ${\hat{n}}_{p} T = ⌊ {\hat{Z}}_{p} / 2 c ⌋$ . Then, taking advantage of spatial correlations in natural scenes, we select total variation (TV) as the penalization term. Thus, the absolute depth map is derived as follows: $Z^{MLE} = \underset{Z}{\arg \max} \sum_{p} \sum_{l = 1}^{k_{p}} \log {N [η α_{p} s (t_{p}^{l} + {\hat{n}}_{p} T - 2 Z_{p} / c) + B]} + β \cdot penalty (Z) .$ (6)

The above equation constitutes a convex optimization problem and can be solved using convex optimization algorithms^[24] to obtain the final estimated distance value of the target.

2.2. Intensity-image guided upsampling

Furthermore, to improve the resolution of single-photon imaging, we can take advantage of the high resolution offered by conventional camera images to guide the upsampling of single-photon images. In our framework, $Z^{H}$ is designated as the high-resolution single-photon depth map we aim to obtain. Correspondingly, the already acquired absolute depth map $Z^{MLE}$ represents a downsampled mapping of $Z^{H}$ , and this downsampling satisfies the following relation: $Z^{MLE} = f_{d} (Z^{H}) + Z_{N},$ (7)where $f_{d} (\cdot)$ is the downsampling function that performs pixel-weighted summation using Gaussian weights, and $Z_{N}$ represents the noise. Assuming the noise follows a Gaussian distribution, its likelihood function can be expressed as follows: $L = - \log [P (Z^{H} | Z^{MLE})] \propto {‖ Z^{MLE} - f_{d} (Z^{H}) ‖}_{2}^{2} .$ (8)

Thus, by applying MLE, we can obtain the high-resolution single-photon image: ${\hat{Z}}^{H} = \underset{Z^{H}}{\arg \min} [L + β \cdot penalty (Z^{H})] .$ (9)

Here, we employ a second-order total generalized variation (TGV) regularization as the penalty term to constraint image, which is represented as $penalty (Z^{H}) = α_{1} {‖ T^{1 / 2} (\nabla Z^{H} - ν) ‖}_{1} + α_{0} {‖ \nabla ν ‖}_{1},$ (10)where $T^{1 / 2}$ is the anisotropic diffusion tensor, $ν$ is an auxiliary variable, and the scalars $α_{1}$ and $α_{0}$ are non-negative weight coefficients. The TGV allows for sharper edge preservation while suppressing noise. Since the problem is convex but nonsmooth due to the TGV regularization term, a primal-dual optimization algorithm is used for solving^[14].

3. Simulations

We conducted simulation experiments using the Middlebury 2007 dataset^[25] to validate the effectiveness of our proposed method in reconstructing high-resolution absolute distance images. The resolution of single-photon imaging is set to $64 \times 64$ pixels. Considering the depth span of only 6 m in the simulation scenario, we conducted a downscaled simulation of the imaging system’s laser period by a factor of 100. We selected laser periods as 10 ns, 14.3 ns, 15.9 ns, 16.1 ns, and 17.1 ns for the simulation, of which the single period maximum unambiguous range is 2.565 m. As shown in Fig. 2, we reconstructed the depth map using our method and compared the results with two state-of-the-art methods.

Figure 2.Simulation results. (a) Ground truth. (b) High-resolution camera image. (c) The simulation results by different methods under various PPP and SBR. From top to bottom, each row corresponds to PPP ∼1 with SBR ∼0.1, PPP ∼10 with SBR ∼0.01, and PPP ∼10 with SBR ∼0.1, respectively. From left to right, each column shows the results reconstructed by Snyder et al. and Dai et al., proposed without and with upsampling, respectively.

Download full size

View all figures

Figure 2(c) demonstrates that conventional algorithms^[26] struggle to accurately estimate the front-to-back position of a target because of range ambiguity. Dai et al.^[23] achieved absolute distance recovery; however, this method leads to the presence of noise in the depth maps. Our proposed method reconstructs absolute distance by combining conventional camera images with single-photon LiDAR, reducing the impact of Poisson noise and thereby achieving higher reconstruction accuracy. Compared with Dai et al.’s method, it shows a lower RMSE, which demonstrates superior absolute distance reconstruction capabilities even with low photon counts and a low signal-to-background ratio (SBR). Besides, we have used conventional camera images for upsampling, which can enrich target details and remarkably improve image resolution. Compared to the results before upsampling, it has a lower RMSE.

By comparing our method and Dai et al.’s method in terms of root mean square error (RMSE) under the same conditions, we find that reconstructions relying purely on LiDAR data, especially in low PPP and low SBR scenarios, tend to have some noisy pixels. By using the upsampling guidance, our algorithm performs well. As shown in Fig. 3, our method outperforms Dai et al.’s method in terms of the RMSE. The trend of our results initially decreases and then stabilizes as SBR/PPP increases, demonstrating that our results achieve the best accuracy.

Figure 3.The RMSE in simulations with different PPP and SBR levels. (a) For PPP ∼1 with SBR ∼0.01, 0.05, and 0.1, the RMSE results are calculated by the methods of Dai et al., proposed with and without upsampling. (b) For SBR ∼0.1 with PPP ∼0.5, 1, 5, and 10, the RMSE results are calculated by the methods of Dai et al., proposed without and with upsampling.

Download full size

View all figures

4. Experiment

4.1. Experimental setup

The schematic of our long-range, high-resolution single-photon imaging system is shown in Fig. 4. We use a digital full-frame camera with a pixel resolution set to $7008 \times 4672$ . The focal length of the objective lens of the camera is 400 mm. A raster scanning single-photon LiDAR using laser source with multiple repetition rates provides raw depth data. The scanning interval is set to be 100 µrad. Single-photon LiDAR uses a coaxial design, allowing for highly efficient detection over wide detection distances. To eliminate the local noise in this coaxial system, we set a temporal separation of laser emission and detection and employ two acousto-optic modulators (AOMs) for noise suppression. The system employs a 1550 nm fiber pulsed laser, and the period is adjustable through an external trigger, which is typically set between 1 and 2 µs. The maximum emission laser power of the system is 250 mW. The system includes a home-made InGaAs/InP single-photon avalanche diode (SPAD) detector with a detection efficiency of 30% and a dark count rate of 1.2 kcps (cps, counts per second). The system uses a home-made field programmable gate array (FPGA) board for precise timing control. Moreover, we use the pixel signals output from the micro-electromechanical system (MEMS) mirror to discern different pixel information and implement a scanning method where each pixel is illuminated by a specific frequency, with different frequencies employed for adjacent pixels.

Figure 4.The layout of the system. (a) Conventional high-resolution camera. (b) Single-photon LiDAR. (c) Data processing system.

Download full size

View all figures

4.2. Experimental results

As shown in Fig. 5(a), we imaged residential buildings located 0.4 to 1.6 kilometers away. The experiment was conducted under five different laser pulse periods (1 µs, 1.43 µs, 1.59 µs, 1.61 µs, 1.71 µs), with a per-pixel acquisition time of 330 µs. We collected a single-photon image of $128 \times 250$ pixels, and the average PPP was $\sim 4.07$ . Guided by intensity information from the camera, we obtained absolute depth estimation shown in Fig. 5(d). Furthermore, using the extracted contour information of the same image, we successfully generated a depth map with 10-fold higher resolution ( $1300 \times 2611$ ) while maintaining high depth accuracy as illustrated in Fig. 5(e). By comparing Figs. 5(f) and 5(g), our method displays better detail of the building after upsampling. The comparison between Figs. 5(h) and 5(i) shows a superiority for capturing detailed 3D surfaces in complex urban environments. These results prove the robustness and accuracy of our method in practical applications.

Figure 5.The experimental results. (a) The target’s location on the map. (b) Photograph of our system. (c) High-resolution camera image of target. (d), (e) The results using our proposed method without and with upsampling. (f), (g) Closeup views of the building details in depth reconstructions [area highlighted by green rectangle in (c)]. (h), (i) 3D profiles of the eaves details in depth reconstructions [highlighted by blue rectangle in (c)].

Download full size

View all figures

5. Conclusion

We proposed and validated a fusion long-range 3D imaging method to overcome the challenges of range ambiguity and low resolution. The outdoor experimental results extended $\sim 4.7$ times the unambiguous range and imaged with over 3 megapixels ( $1300 \times 2611$ ), a 10-fold increase in resolution. By providing accurate depth perception and fine spatial awareness, the results may offer enhanced methods for rapid, high-resolution, long-range 3D imaging for large-scale scenes. These are essential for target identification and environmental mapping in complex areas.

Special Issue: SPECIAL ISSUE ON QUANTUM IMAGING

Received: Dec. 28, 2023

Accepted: Mar. 25, 2024

Published Online: Jun. 27, 2024

The Author Email: Zheng-Ping Li (lizhp@ustc.edu.cn), Feihu Xu (feihuxu@ustc.edu.cn)

DOI:10.3788/COL202422.060005