Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1615004(2025)

6D Pose Detection Method Based on Cross-Attention Weighting Mechanism

Yu Ye1, Jing Zhang1、*, Aimin Wang1, Heng Liu1, and Mingju Chen2
Author Affiliations
  • 1Southwest University of Science and Technology, School of Information Engineering, Mianyang 621010, Sichuan , China
  • 2Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science & Engineering, Yibin 643002, Sichuan , China
  • show less

    To address the challenge of 6D object pose detection in unstructured scenes, where foreground-background similarity affects accuracy, we propose a 6D pose detection method based on a cross-attention weighting mechanism. Initially, an RGB-D mask isolates the region of interest (ROI) in the image. RGB semantic features are extracted using the PSPNet module, while global and local point cloud features are extracted from the corresponding region using the PointNet module, enabling dual feature representations for the same object. These RGB semantic and point cloud features are then input into a cross-attention mechanism, which facilitates their deep integration, producing foreground object fusion features with richer contextual information and enhancing the model's understanding of complex scenes. To improve robustness in scenarios with background interference and color overlap, a squeeze-and-excitation (SE) mechanism is introduced into the backbone network, allowing for the distinction between foreground and background regions with similar features. Finally, the 6D pose estimation is further optimized by utilizing both object color features and point cloud geometric transformation features, resulting in improved pose detection accuracy. Comparative experiments demonstrate that, compared to DenseFusion, the proposed method achieves a 2.5 percentage points improvement in the average average distance on the LineMOD dataset and a 1.9 percentage points improvement in the average area under curve on the YCB-Video dataset. Real-world scene tests show an overall centroid deviation of less than 2 mm and angular error below 1.5°, confirming the practical applicability of the proposed method.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Yu Ye, Jing Zhang, Aimin Wang, Heng Liu, Mingju Chen. 6D Pose Detection Method Based on Cross-Attention Weighting Mechanism[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1615004

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Jan. 3, 2025

    Accepted: Mar. 12, 2025

    Published Online: Aug. 11, 2025

    The Author Email: Jing Zhang (zhangjing@swust.edu.cn)

    DOI:10.3788/LOP250443

    CSTR:32186.14.LOP250443

    Topics