Infrared and Laser Engineering, Volume. 54, Issue 6, 20250076(2025)

Multi-scale phase extraction network for structured light based on discrete wavelet transform and attention mechanism

Jianhua SHANG, Gang WANG, Yang LIU, Haiqin XU, and Jiatong SUN
Author Affiliations
  • College of Information Science and Technology, Donghua University, Shanghai 201620, China
  • show less

    ObjectiveFringe Projection Profilometry (FPP) is a three-dimensional imaging technique based on phase demodulation algorithms. Its non-contact nature, high precision, and low cost make it highly valuable and promising for applications in precision measurement fields such as biological imaging, robotic vision, and industrial scenario. This technology utilizes a digital projector to project multiple frames of cosine fringe patterns with fixed phase shifts or specially encoded fringe images onto the surface of the object under measurement. The surface morphology variations of the object cause distortions in the projected fringes. An industrial camera captures these deformed fringe images, and phase information is demodulated from multiple fringe projection images to ultimately reconstruct the three-dimensional morphology of the object. Therefore, the key technology lies in using the Phase-shifting algorithm (PSA) to calculate the phase information of the fringe patterns. Unlike classical algorithms that rely on capturing multiple frames and predefined mathematical models, methods based on neural network learn the nonlinear mapping between fringe patterns and phase distributions through extensive data training process, thereby achieving efficient and accurate phase prediction. Existing models have solved some challenges in various aspects but still exhibit certain errors. Additionally, in the process of enhancing the network's performance, the balance between the number of parameters and computational complexity is often overlooked. For this purpose, a wavelet attention based multi-scale phase extraction network (WA-MSPNet) is proposed in this paper.MethodsThe proposed network is based on an encoder-decoder architecture, which employs wavelet transforms to replace traditional pooling layers for down-sampling and implements a channel-spatial mixed attention mechanism in the wavelet domain. During the phase prediction, the network fully leverages semantic features from different levels to achieve multi-scale feature enhancement and prediction output. The encoder is composed of a wavelet mixed attention mechanism and convolution operations. The down-sampling module employs Discrete Wavelet Transform (DWT), which halves the size of the output images, and uses the wavelet low-frequency components extracted by DWT as inputs to the mixed attention mechanism, thereby implementing a wavelet-domain-based attention mechanism. Following the channel-spatial mixed attention module, two layers of 3×3 convolutions, batch normalization, and ReLU activation functions are implemented. Furthermore, unlike traditional decoders that perform predictions at the final layer, the proposed network adopts a multi-scale feature fusion prediction output strategy. This approach fully leverages features from different levels of the decoder, enhancing prediction performance through multi-scale fusion. It ensures the refinement and retention of both deep and shallow features, thereby improving the overall phase extraction performance.Results and DiscussionsQuantitative and qualitative comparative analyses of the proposed WA-MSPNet with the classic UNet, Attention Gate-enhanced UNet (Att-UNet) and Transformer-based Swin-UNet on a test dataset (Tab.1). Firstly, the average errors of the numerator component $ M $ and the denominator component $ D $ is calculated. Compared to UNet, the proposed method achieved a 14.92% reduction in MAE and a 3.82% reduction in RMSE. Compared to Att-UNet, the proposed method reduced MAE by 9.79% and RMSE by 5.56%. Compared to Swin-UNet, the proposed method reduced MAE by 44.53% and RMSE by 42.88%. For the PSNR metric, the proposed method improved by 0.153 dB, 0.532 dB and 5.012 dB compared to UNet, Att-UNet and Swin-UNet, respectively. The proposed network can accurately predict the two components necessary for phase information extraction. Secondly, the phase distribution is obtained by taking the arctangent of the two components. The proposed method outperforms the other methods. The proposed method performs more accurate phase prediction compared to UNet, Att-UNet and Swin-UNet when the fringe pattern involves single object (Fig.4) or multiple objects (Fig.6). Additionally, while achieving better performance, the proposed network has a parameter count of 14.509M and requires 152.566 GFLOPs (Tab.2). The inference time of the proposed WA-MSPNet is 17.59 ms (Tab.3) when dealing with a single fringe pattern.ConclusionsIn order to reduce phase prediction errors while balancing between the number of parameters and computational complexity, a wavelet attention based WA-MSPNet is proposed. The network utilizes wavelet domain features to constructs a channel-spatial mixed attention mechanism, which enhances multi-scale feature perception and improves the quality of cross-layer feature fusion. Additionally, during the prediction stage, a bottom-up multi-scale fusion strategy is employed to integrate both deep and shallow features and connect features from different layers, effectively enhancing phase prediction accuracy. Experimental results demonstrate that the proposed WA-MSPNet achieves excellent phase extraction performance. Compared to UNet, Att-UNet and Swin-UNet, WA-MSPNet extract phase information more precisely while maintaining lower parameters and FLOPs, making it a promising approach for phase extraction applications.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Jianhua SHANG, Gang WANG, Yang LIU, Haiqin XU, Jiatong SUN. Multi-scale phase extraction network for structured light based on discrete wavelet transform and attention mechanism[J]. Infrared and Laser Engineering, 2025, 54(6): 20250076

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Optical imaging, display and information processing

    Received: Jan. 24, 2025

    Accepted: --

    Published Online: Jul. 1, 2025

    The Author Email:

    DOI:10.3788/IRLA20250076

    Topics