Multi-scale phase extraction network for structured light based on discrete wavelet transform and attention mechanism

Jianhua SHANG; Gang WANG; Yang LIU; Haiqin XU; Jiatong SUN

doi:10.3788/IRLA20250076

Infrared and Laser Engineering, Volume. 54, Issue 6, 20250076(2025)

Multi-scale phase extraction network for structured light based on discrete wavelet transform and attention mechanism

Jianhua SHANG, Gang WANG, Yang LIU, Haiqin XU, and Jiatong SUN

Author Affiliations

College of Information Science and Technology, Donghua University, Shanghai 201620, China

show less

Abstract Get PDF(in Chinese)

ObjectiveFringe Projection Profilometry (FPP) is a three-dimensional imaging technique based on phase demodulation algorithms. Its non-contact nature, high precision, and low cost make it highly valuable and promising for applications in precision measurement fields such as biological imaging, robotic vision, and industrial scenario. This technology utilizes a digital projector to project multiple frames of cosine fringe patterns with fixed phase shifts or specially encoded fringe images onto the surface of the object under measurement. The surface morphology variations of the object cause distortions in the projected fringes. An industrial camera captures these deformed fringe images, and phase information is demodulated from multiple fringe projection images to ultimately reconstruct the three-dimensional morphology of the object. Therefore, the key technology lies in using the Phase-shifting algorithm (PSA) to calculate the phase information of the fringe patterns. Unlike classical algorithms that rely on capturing multiple frames and predefined mathematical models, methods based on neural network learn the nonlinear mapping between fringe patterns and phase distributions through extensive data training process, thereby achieving efficient and accurate phase prediction. Existing models have solved some challenges in various aspects but still exhibit certain errors. Additionally, in the process of enhancing the network's performance, the balance between the number of parameters and computational complexity is often overlooked. For this purpose, a wavelet attention based multi-scale phase extraction network (WA-MSPNet) is proposed in this paper.MethodsThe proposed network is based on an encoder-decoder architecture, which employs wavelet transforms to replace traditional pooling layers for down-sampling and implements a channel-spatial mixed attention mechanism in the wavelet domain. During the phase prediction, the network fully leverages semantic features from different levels to achieve multi-scale feature enhancement and prediction output. The encoder is composed of a wavelet mixed attention mechanism and convolution operations. The down-sampling module employs Discrete Wavelet Transform (DWT), which halves the size of the output images, and uses the wavelet low-frequency components extracted by DWT as inputs to the mixed attention mechanism, thereby implementing a wavelet-domain-based attention mechanism. Following the channel-spatial mixed attention module, two layers of 3×3 convolutions, batch normalization, and ReLU activation functions are implemented. Furthermore, unlike traditional decoders that perform predictions at the final layer, the proposed network adopts a multi-scale feature fusion prediction output strategy. This approach fully leverages features from different levels of the decoder, enhancing prediction performance through multi-scale fusion. It ensures the refinement and retention of both deep and shallow features, thereby improving the overall phase extraction performance.Results and DiscussionsQuantitative and qualitative comparative analyses of the proposed WA-MSPNet with the classic UNet, Attention Gate-enhanced UNet (Att-UNet) and Transformer-based Swin-UNet on a test dataset (Tab.1). Firstly, the average errors of the numerator component $ M $ and the denominator component $ D $ is calculated. Compared to UNet, the proposed method achieved a 14.92% reduction in MAE and a 3.82% reduction in RMSE. Compared to Att-UNet, the proposed method reduced MAE by 9.79% and RMSE by 5.56%. Compared to Swin-UNet, the proposed method reduced MAE by 44.53% and RMSE by 42.88%. For the PSNR metric, the proposed method improved by 0.153 dB, 0.532 dB and 5.012 dB compared to UNet, Att-UNet and Swin-UNet, respectively. The proposed network can accurately predict the two components necessary for phase information extraction. Secondly, the phase distribution is obtained by taking the arctangent of the two components. The proposed method outperforms the other methods. The proposed method performs more accurate phase prediction compared to UNet, Att-UNet and Swin-UNet when the fringe pattern involves single object (Fig.4) or multiple objects (Fig.6). Additionally, while achieving better performance, the proposed network has a parameter count of 14.509M and requires 152.566 GFLOPs (Tab.2). The inference time of the proposed WA-MSPNet is 17.59 ms (Tab.3) when dealing with a single fringe pattern.ConclusionsIn order to reduce phase prediction errors while balancing between the number of parameters and computational complexity, a wavelet attention based WA-MSPNet is proposed. The network utilizes wavelet domain features to constructs a channel-spatial mixed attention mechanism, which enhances multi-scale feature perception and improves the quality of cross-layer feature fusion. Additionally, during the prediction stage, a bottom-up multi-scale fusion strategy is employed to integrate both deep and shallow features and connect features from different layers, effectively enhancing phase prediction accuracy. Experimental results demonstrate that the proposed WA-MSPNet achieves excellent phase extraction performance. Compared to UNet, Att-UNet and Swin-UNet, WA-MSPNet extract phase information more precisely while maintaining lower parameters and FLOPs, making it a promising approach for phase extraction applications.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

deep neural network fringe pattern analysis fringe projection phase extraction wavelet transform

Tools

Get Citation

Copy Citation Text

Jianhua SHANG, Gang WANG, Yang LIU, Haiqin XU, Jiatong SUN. Multi-scale phase extraction network for structured light based on discrete wavelet transform and attention mechanism[J]. Infrared and Laser Engineering, 2025, 54(6): 20250076

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Optical imaging, display and information processing

Received: Jan. 24, 2025

Accepted: --

Published Online: Jul. 1, 2025

The Author Email:

DOI:10.3788/IRLA20250076

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology