Optics and Precision Engineering, Volume. 33, Issue 6, 928(2025)
Multi-level feature fusion for camera pose regression
To improve the accuracy and stability of camera pose estimation in complex scenarios, this paper independently designed the ResGraphLoc network. This network further enhanced the pose regression accuracy of the camera in scenarios with occlusion, illumination changes, and low texture by introducing the residual network and the graph attention mechanism. The network adopted ResNet101 as the feature encoder and enhanced the significant feature extraction ability through the improved residual block. The graph attention layer was utilized to fuse multi-level feature maps and realized feature information diffusion and aggregation through the multi-head self-attention mechanism. Finally, the position and angle features were extracted from the feature embedding through the nonlinear MLP layer to complete the end-to-end camera pose regression. On the large-scale outdoor dataset, the pose error of the ResGraphLoc model was superior to the existing algorithms. In the LOOP and FULL scenarios, the pose regression results are 7.18 m, 2.48° and 16.96 m, 3.16° respectively, with an improvement of more than 25% compared to the benchmark model. In the 4Seasons dataset's Neighborhood scenario, the outdoor localization error can be as low as 1.40 m and 0.76°.In the indoor dataset with missing and repetitive textures, the position and angle regression results can reach 0.08m and 3.25° respectively. The experimental results verify the high accuracy and stability of ResGraphLoc in complex environments and can effectively cope with occlusion, illumination changes, and low texture scenarios.
Get Citation
Copy Citation Text
Junwen SI, Ziwei ZHOU. Multi-level feature fusion for camera pose regression[J]. Optics and Precision Engineering, 2025, 33(6): 928
Category:
Received: Jul. 13, 2024
Accepted: --
Published Online: Jun. 16, 2025
The Author Email: