Multiscale Regional-Attention Stacked-Object Grasp Detection Network

Shengjun Xu; Zhiwei Cui; Ya Shi; Xiaohan Li; Erhu Liu; Abdelhamid Hameg

doi:10.3788/LOP241866

Laser & Optoelectronics Progress, Volume. 62, Issue 10, 1015009(2025)

Multiscale Regional-Attention Stacked-Object Grasp Detection Network

Shengjun Xu^1,2, Zhiwei Cui^1,2、*, Ya Shi^1,2, Xiaohan Li^1,2, Erhu Liu^1,2, and Abdelhamid Hameg^1,2

¹College of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, Shaanxi , China

²Key Laboratory of Intelligent Manufacturing Technology in Construction Manufacturing in Xi'an City, Xi'an 710055, Shaanxi , China

show less

Abstract Get PDF(in Chinese)

Aiming at the problem that it is difficult to recognize object grasp points because of overlap or occlusion between multiple objects in stacked scenes, a multiscale regional-attention stacked-object grasp detection network is proposed. First, a multiscale regional-attention feature fusion module is proposed based on the feature pyramid architecture, which improves the network's ability to pay attention to different feature dimensions by introducing deformable convolution and full convolution. Second, a multiscale region-attention mechanism is used to decouple the grabbable area from the background in the stacked scene image. Different regions of different scale feature maps are weighted gradually to improve the network's ability to pay attention to the saliency of the grabbable area and its background-noise anti-interference ability. Finally, a double sampling region candidate module is proposed to further refine the candidate anchor boxes on the basis of the target ground truth, eliminate a large number of negative samples, and thus improve the quality of the candidate anchor boxes. The final grasp detection results are output by the classification regression module. Stacked-object grasp detection accuracy experiments are carried out on the VMRD and Cornell datasets. The experimental results show that the average detection accuracy of the proposed network on the VMRD dataset is 98.18%, whereas it is 98.0% on the Cornell dataset. The proposed network has accurate grasp detection effect and strong robustness in complex scenes.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism deformable convolution grasp detection region candidate stacked scene

Tools

Get Citation

Copy Citation Text

Shengjun Xu, Zhiwei Cui, Ya Shi, Xiaohan Li, Erhu Liu, Abdelhamid Hameg. Multiscale Regional-Attention Stacked-Object Grasp Detection Network[J]. Laser & Optoelectronics Progress, 2025, 62(10): 1015009

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites