Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1615004(2025)
6D Pose Detection Method Based on Cross-Attention Weighting Mechanism
To address the challenge of 6D object pose detection in unstructured scenes, where foreground-background similarity affects accuracy, we propose a 6D pose detection method based on a cross-attention weighting mechanism. Initially, an RGB-D mask isolates the region of interest (ROI) in the image. RGB semantic features are extracted using the PSPNet module, while global and local point cloud features are extracted from the corresponding region using the PointNet module, enabling dual feature representations for the same object. These RGB semantic and point cloud features are then input into a cross-attention mechanism, which facilitates their deep integration, producing foreground object fusion features with richer contextual information and enhancing the model's understanding of complex scenes. To improve robustness in scenarios with background interference and color overlap, a squeeze-and-excitation (SE) mechanism is introduced into the backbone network, allowing for the distinction between foreground and background regions with similar features. Finally, the 6D pose estimation is further optimized by utilizing both object color features and point cloud geometric transformation features, resulting in improved pose detection accuracy. Comparative experiments demonstrate that, compared to DenseFusion, the proposed method achieves a 2.5 percentage points improvement in the average average distance on the LineMOD dataset and a 1.9 percentage points improvement in the average area under curve on the YCB-Video dataset. Real-world scene tests show an overall centroid deviation of less than 2 mm and angular error below 1.5°, confirming the practical applicability of the proposed method.
Get Citation
Copy Citation Text
Yu Ye, Jing Zhang, Aimin Wang, Heng Liu, Mingju Chen. 6D Pose Detection Method Based on Cross-Attention Weighting Mechanism[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1615004
Category: Machine Vision
Received: Jan. 3, 2025
Accepted: Mar. 12, 2025
Published Online: Aug. 11, 2025
The Author Email: Jing Zhang (zhangjing@swust.edu.cn)
CSTR:32186.14.LOP250443