Chinese Journal of Lasers, Volume. 52, Issue 2, 0204004(2025)

Sub‐Pixel Level Self‐Supervised Convolutional Neural Network for Rapid Speckle Image Matching

Lin Li, Peng Wang*, Yue Li, Haotian Wang, Luhua Fu, and Changku Sun
Author Affiliations
  • State Key Laboratory of Precision Measurement Technology and Instruments, Tianjin 300372, China
  • show less

    Objective

    The combined optical measurement approach of vision and structured light has become a prevalent non-contact measurement technique given its advantages of simple hardware configuration, high measurement accuracy, and strong adaptability to complex materials and geometries. Compared to fringe projection profilometry, speckle projection profilometry can obtain three-dimensional information of a target by projecting a single frame image. Therefore, the binocular measurement method based on laser speckle projection has become a mainstream approach for real-time acquisition of three-dimensional motion information. However, due to the randomness, disorderliness, high-frequency noise, and lack of significant feature points in speckle patterns, traditional feature extraction algorithms, which mainly rely on prominent features such as corners and edges, struggle to detect stable and repeatable feature points from speckle patterns, resulting in suboptimal feature matching. In contrast, deep learning-based methods offer the advantages of autonomously learning and extracting relevant features directly from the input data, providing higher measurement accuracy, faster measurement speed, and greater robustness. Nevertheless, existing deep learning-based algorithms rely entirely on manually labeled datasets for training, leading to prohibitively high training costs. Therefore, to address these issues, this study proposes a self-supervised convolutional neural network with sub-pixel accuracy, employing a deep learning-based method to extract speckle feature information and overcome the shortcomings of traditional speckle projection measurement methods, such as slow measurement speed, low measurement accuracy, and poor robustness. Additionally, transfer learning and self-supervised training strategies are adopted to mitigate the network’s reliance on manually labeled datasets.

    Methods

    The network architecture proposed in this study primarily comprises a shared encoder and two parallel branches. This architecture not only reduces the number of parameters required for learning but also enhances the model’s ability to share computations and representations between tasks. To improve the network’s operation speed and enhance its feature extraction capabilities, a dynamic depth separable convolution module is proposed for the backbone. This module reduces the computational load and network parameters through depth separable convolutions while enhancing the feature extraction capabilities via dynamic convolutions. To improve the accuracy of speckle point extraction and matching, coordinate refinement modules and fine-grained matching modules are designed in the network’s feature detection and feature matching branches, respectively, thus elevating the speckle feature point matching to sub-pixel accuracy and enabling end-to-end training. To mitigate the network’s dependency on labeled datasets, a synthetic dataset named the synthetic speckle dataset is first created to guide the model’s learning. In this dataset, we model the feature points by mimicking the distribution patterns of speckles, thereby eliminating label ambiguity and learning bias. Subsequently, the model trained on this synthetic dataset is used to annotate the real dataset, replacing manual annotation. Finally, the pre-trained model is transferred to the real dataset for further training, employing data augmentation techniques during training to enhance the model’s generalization and feature representation capabilities.

    Results and Discussions

    For network training, we create a synthetic dataset called the synthetic speckle dataset (Fig. 7) to guide the model’s learning. Subsequently, we utilize a binocular measurement system based on a laser speckle projector (Fig. 9) to capture speckle images projected onto a helmet (Fig. 8) under various ambient light intensities. By employing a high-precision turntable, we simulate the translational and rotational movements of the helmet to mimic the movements of a pilot’s head. The proposed method’s efficacy is validated through experiments (Fig. 11), demonstrating the effectiveness of the combined self-supervised training and transfer learning approach (Table 1). We visualize the training process (Fig. 10). Comparative experiments on real speckle datasets are conducted against traditional algorithms and four advanced deep learning-based networks (Fig. 12), proving that the proposed method significantly outperforms others in terms of the number of speckle matches, matching accuracy, and robustness across different ambient light intensities, helmet angles, and texture richness. The matching time of 46 ms is second only to that of DISK, with an accuracy of 92.84% (Table 2). The results of three-dimensional reconstruction experiments (Figs. 13?16) demonstrate that the proposed measurement method can accurately and clearly restore the three-dimensional morphological features of objects with varying materials, sizes, and shapes, showcasing excellent robustness. Finally, ablation experiments validate the superiority of the designed dynamic depth separable convolution module and the fine-grained matching module (Table 3). We provide an explanation of the advantages of the provided dynamic depth separable convolution module (Fig. 17).

    Conclusions

    In this paper, we propose a lightweight and efficient self-supervised convolutional neural network based on speckle projection for sub-pixel level image matching of pilot helmets. To address the challenges posed by the reliance of the existing deep learning-based algorithms on manually labeled datasets for training, we introduce a novel approach that combines transfer learning, self-supervised training, and data augmentation techniques. This method mitigates the dependency on manually labeled datasets, enabling the network to comprehensively learn various features and information from abundant data, thereby enhancing its generalization and feature representation capabilities. Additionally, to improve the network’s operation speed and feature extraction capabilities, we propose a dynamic depth separable convolution module for the backbone. To enhance the accuracy of speckle point extraction and matching, we employ coordinate refinement and fine-grained matching modules at the network’s head, achieving sub-pixel level precision. For the feasibility of the experimental system and methods, we create both a synthetic dataset and a real speckle dataset to train the model and conduct comparative experiments. The experimental results demonstrate the proposed method's significant advantages in the number of speckle feature point matching, matching accuracy, matching speed, and robustness. In summary, this research lays a crucial foundation for the rapid and accurate measurement of pilot helmet poses.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Lin Li, Peng Wang, Yue Li, Haotian Wang, Luhua Fu, Changku Sun. Sub‐Pixel Level Self‐Supervised Convolutional Neural Network for Rapid Speckle Image Matching[J]. Chinese Journal of Lasers, 2025, 52(2): 0204004

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Measurement and metrology

    Received: Jun. 17, 2024

    Accepted: Aug. 1, 2024

    Published Online: Jan. 20, 2025

    The Author Email: Wang Peng (wang_peng@tju.edu.cn)

    DOI:10.3788/CJL240981

    CSTR:32183.14.CJL240981

    Topics