Laser & Optoelectronics Progress, Volume. 62, Issue 14, 1412004(2025)
Indoor Monocular Depth Estimation Based on Global-Local Feature Fusion
To address the challenges of low-depth estimation accuracy, blurred object contours, and detail loss caused by occlusion and lighting variations in complex indoor scenes, we propose an indoor monocular depth estimation algorithm based on global-local feature fusion. First, a hierarchical Transformer structure is incorporated into the decoder to enhance global feature extraction, while a simplified pyramid pooling module further enriches feature representation. Second, a gated adaptive aggregation module is introduced in the decoder to optimize feature fusion during upsampling by effectively integrating global and local information. Finally, a multi-kernel convolution module is applied at the end of the decoder to refine local details. Experimental results on the NYU Depth V2 indoor scene dataset demonstrate that the proposed algorithm significantly improves depth prediction accuracy, achieving a root mean square error of only 0.361. The generated depth maps exhibit enhanced continuity and detail representation.
Get Citation
Copy Citation Text
Zengyu Tian, Changku Sun, Yue Li, Luhua Fu, Peng Wang. Indoor Monocular Depth Estimation Based on Global-Local Feature Fusion[J]. Laser & Optoelectronics Progress, 2025, 62(14): 1412004
Category: Instrumentation, Measurement and Metrology
Received: Jan. 2, 2025
Accepted: Feb. 25, 2025
Published Online: Jul. 16, 2025
The Author Email: Peng Wang (wang_peng@tju.edu.cn)
CSTR:32186.14.LOP250436