Laser & Optoelectronics Progress, Volume. 61, Issue 14, 1415004(2024)
Multiview Stereo Reconstruction with Feature Aggregation Transformer
In this study, a multiview stereo network reconstruction method based on a feature aggregation transformer was proposed to address the problem of blurred matching in areas with weak textures or non-Lambertian surfaces. This is caused by the lack of understanding of the overall image and connections between images in existing multiview stereo methods. Initially, the input image extracted features by fusing deformable convolutional feature pyramid networks. Further, the size and shape of the receptive field were adaptively adjusted. Subsequently, a Transformer-based spatial aggregation module was introduced to capture the texture features of scenes more accurately for feature aggregation using the intra-image self-attention mechanism. This yielded the intra-view global contextual information and inter-image cross-attention mechanism to efficiently obtain inter-view information interactions, thereby achieving a reliable feature match by capturing the texture features of scenes more accurately. Finally, visibility cost aggregation was employed to estimate pixel visibility information to remove noisy and mismatched pixels from cost aggregation. Experimental results on the DTU and Tanks & Temples datasets show that the proposed method achieves superior reconstruction performance compared with other methods.
Get Citation
Copy Citation Text
Min Wang, Mingfu Zhao, Tao Song, Weiwei Li, Yuan Tian, Cheng Li, Yu Zhang. Multiview Stereo Reconstruction with Feature Aggregation Transformer[J]. Laser & Optoelectronics Progress, 2024, 61(14): 1415004
Category: Machine Vision
Received: Nov. 22, 2023
Accepted: Dec. 21, 2023
Published Online: Jul. 8, 2024
The Author Email: Tao Song (tsong@cqut.edu.cn)
CSTR:32186.14.LOP232546