Chinese Journal of Lasers, Volume. 52, Issue 17, 1709001(2025)
Multi‐View 3D Reconstruction Based on Adaptive Feature Enhancement in Inspection Scenes
With the continuous advancement of industrial intelligence, intelligent robots have been widely applied in inspection scenes of key infrastructures, such as power, transportation, and energy. By performing 3D reconstruction of multi-view inspection scenes, intelligent robots can obtain accurate environmental perception capabilities, enabling autonomous operations. However, in complex inspection scenes, the existing methods often fail to extract sufficient features from weakly textured and edge regions, resulting in a low reconstruction accuracy and compromising the overall quality of the reconstruction. To address these issues, this study proposed AFE-MVSNet, a multi-view 3D reconstruction network based on adaptive feature enhancement in inspection scenes, which aims to improve the reconstruction performance in complex areas, particularly those with weak textures and edges.
AFE-MVSNet consists of two main components: an adaptive feature enhancement network (AFENet) and a multi-scale depth estimation process. First, AFENet, built on a feature pyramid network, extracts multi-scale features and incorporates attention mechanisms at each upsampling stage to enhance fine details, such as edges and textures. To further improve feature representation in weakly textured and edge regions, AFENet incorporates an adaptive perception module for features (APMF) based on deformable convolutional network that dynamically adjusts kernel sampling positions and weights to enlarge the receptive field. Second, the multi-scale depth estimation network adopts a cascaded structure that refines the depth maps from coarse to fine through cost-volume construction and depth prediction at each stage. The final depth maps are fused with the corresponding color images to generate 3D colored point clouds. To enhance training, a focal loss function is used to emphasize challenging regions, thereby improving the ability of the network to learn from hard-to-extract features and the reconstruction performance.
To validate the effectiveness of AFE-MVSNet in inspection scenes, a inspection scene dataset was constructed and a transfer learning strategy was adopted. The model was pre-trained on the DTU public dataset and then fine-tuned on the inspection scene dataset to enhance its reconstruction performance in real-world environments. AFE-MVSNet was compared with several mainstream methods on both the DTU and inspection scene datasets. The experimental results demonstrate that AFE-MVSNet significantly outperforms the existing methods in weakly textured and edge regions. It achieves an overall reconstruction error (OA) of 0.309 mm on the DTU dataset and an end-point error (EPE) of 1.006 m on the inspection scene dataset, surpassing the baseline network and its performance before fine-tuning. In addition, ablation experiments were conducted on the APMF module, attention mechanism, and focal loss function to verify the effectiveness of each component.
To address the poor reconstruction performance in weakly textured and edge regions of inspection scenes, this study proposed AFE-MVSNet, a multi-view 3D reconstruction network based on adaptive feature enhancement. The network aims to improve the reconstruction quality in complex areas. The main technical contributions of this study are: 1) To enhance feature representation in weakly textured and edge regions, the adaptive feature enhancement network (AFENet) was proposed, which integrates attention mechanisms to improve the extraction and representation of texture features in inspection scenes. 2) To strengthen the perceptual ability in complex areas, an adaptive perception module for features (APMF) was designed based on deformable convolutional network. This module adaptively adjusts the sampling positions and weights of the convolution kernels to enlarge the receptive field. 3) To improve the learning ability of the network, a focal loss function was introduced to enhance its ability to learn from hard-to-extract feature regions, thereby improving the reconstruction performance. 4) To improve the reconstruction ability, a inspection scene dataset was constructed. The network was first pre-trained on the DTU public dataset and then fine-tuned on the inspection scene dataset to enhance its ability to learn features. The experimental results validate the effectiveness of AFE-MVSNet in inspection scenes. The reconstructed 3D point cloud models exhibit well-preserved, weakly textured areas and clearly defined object edges. The proposed network provides a theoretical foundation for intelligent robotic applications in inspection tasks, and has significant potential for real-world engineering applications.
Get Citation
Copy Citation Text
Yongkang Zhang, Yi An, Zhiyong Yang, Yan Chen, Haochen Sun. Multi‐View 3D Reconstruction Based on Adaptive Feature Enhancement in Inspection Scenes[J]. Chinese Journal of Lasers, 2025, 52(17): 1709001
Category: Imaging and Information Processing
Received: Feb. 20, 2025
Accepted: May. 8, 2025
Published Online: Sep. 3, 2025
The Author Email: Yi An (anyi@dlut.edu.cn)
CSTR:32183.14.CJL250543