Acta Optica Sinica, Volume. 45, Issue 11, 1111003(2025)
Depth Estimation Based on Single-Lens Point Spread Function Regulation
The awareness of depth is important for many computer vision tasks, such as autonomous driving and 3D reconstruction. Existing depth estimation methods include methods using structured light, time of flight, stereo-based depth estimation and monocular depth estimation. Monocular depth estimation has significant advantages in terms of power consumption, cost, and size. There are two ways to achieve monocular depth estimation. One is to learn the depth clues contained in the scene itself in the image through the neural network, such as texture gradients, shading, occlusions, and the type and size of the object. But this method is lack of interpretability. The other is depth from defocus (DFD) or depth estimation based on coded aperture, which estimates depth by identifying depth-related optical features in the optical system. This method often requires the addition of a phase mask to the existing camera, and does not utilize the depth information encoding ability of the lens itself. Therefore, we propose a depth estimation method based on single-lens optical system, which further reduces the size and volume of the optical system on the basis of existing single-lens depth estimation, while improving the accuracy of depth estimation.
A monocular depth estimation system usually consists of a camera and diffractive optical elements. The implementation method involves designing a diffraction optical element such that the point spread functions (PSFs) at different depths present different spatial or spectral structures. Using optical system to encode scene depth information into 2D images, then a decoding algorithm is used to decode the encoded image to estimate the scene’s depth map. The single-lens optical system also has the capability to present different spatial or spectral structures for objects at varying depths.
For this specific imaging system of a single lens, we propose a simulation model for the four-dimensional point spread function that varies with multi-wavelength, depth-aware, spatially-variant four-dimensional point spread functions along with a differentiable optical imaging model. We then introduce an optimization constraint method for depth estimation tasks, which regulates the point spread function in both depth and field dimensions. For the depth estimation algorithm, we should consider that depth estimation methods relying on different spatial or spectral structures of PSFs are quite dependent on the object’s texture in the scene. Since the semantic information of the scene can compensate for this limitation, we propose a depth estimation network that includes a semantic information extraction preprocessing model. We connect the imaging model and depth estimation algorithm, jointly designing the single-lens optical system and the depth estimation algorithm. Finally, a visible light depth detector which includes an aspherical single-lens optical system and corresponding depth estimation algorithm is designed.
To verify our method, we train and test it on the NYU Depth V2 dataset, set the target depth range from 1.0 m to 5.0 m, The initial single-lens optical system, characterized by a focal length of 31.4 mm and a field of view of approximately 10°, is optimized along with its corresponding depth estimation algorithm. we compare our method with three alternative approaches: 1) the conventional depth from defocus model, which treats the lens as a thin lens; 2) phase coded-aperture model implemented with a diffractive optical element size of 256×256; 3) phase coded-aperture model which has a phase mask with several concentric rings. Our designed single-lens depth estimation model achieved a relative error of as low as 0.083 on the NYU Depth V2 dataset, demonstrating the lowest relative error among the compared methods. To further evaluate the contribution of our proposed method, we conducted ablation experiments. Specifically, we replaced the optimized single-lens optical system with an unoptimized version and substituted the semantic information extraction preprocessing step with a neural network lacking this preprocessing capability. Both modifications resulted in a degradation of depth estimation accuracy, thus substantiating the effectiveness of our method in improving the depth estimation model.
We introduce an end-to-end single-lens depth estimation model. Firstly, in order to accurately simulate the out-of-focus and off-axis aberrations in the real camera lens in the depth scene, we propose a differentiable imaging model. Then, we introduce a single lens optimization constraint method to regulate the point spread functions of a single lens optical system to improve the depth dependence of the imaging response features of the optical system, so that the single lens can be optimized along the direction of maximizing the depth estimation performance of the model. In this paper, a preprocessing method combining semantic information is proposed to make up for the lack of dependent image texture in decoding process. Finally, by jointly optimizing the single-lens optical system and the depth estimation algorithm, the depth estimation model based on the minimalist optical system is realized, and the simulation and test are carried out on the NYU Depth V2 dataset. The results show that the design method can greatly reduce the volume of the depth estimation system while maintaining a high depth estimation performance. It has certain significance in the application of unmanned aerial vehicle platform distance sensor and other fields.
Get Citation
Copy Citation Text
Zaiwu Sun, Fanjiao Tan, Pengliang Yu, Zongling Li, Rongshuai Zhang, Changjian Yang, Qingyu Hou. Depth Estimation Based on Single-Lens Point Spread Function Regulation[J]. Acta Optica Sinica, 2025, 45(11): 1111003
Category: Imaging Systems
Received: Feb. 26, 2025
Accepted: Apr. 15, 2025
Published Online: Jun. 23, 2025
The Author Email: Qingyu Hou (houqingyu@126.com)
CSTR:32393.14.AOS250660