3D Object Detection Based on Voxel Self-Attention Auxiliary Networks

Jie Cao; Yiqiang Peng; Likang Fan; Longfei Wang

doi:10.3788/LOP240923

Laser & Optoelectronics Progress, Volume. 61, Issue 24, 2415004(2024)

3D Object Detection Based on Voxel Self-Attention Auxiliary Networks

Jie Cao¹, Yiqiang Peng^1,2,3, Likang Fan^1,2,3、*, and Longfei Wang¹

Author Affiliations

¹School of Automobile and Transportation, Xihua University, Chengdu 610039, Sichuan , China

²Vehicle Measurement Control and Safety Key Laboratory of Sichuan Province, Xihua University, Chengdu 610039, Sichuan , China

³Provincial Engineering Research Center for New Energy Vehicle Intelligent Control and Simulation Test Technology of Sichuan, Chengdu 610039, Sichuan , China

show less

Abstract Get PDF(in Chinese)

Figures & Tables(13)

Fig. 1. Range of attention

Download full size

Fig. 2. Voxel feature query

Download full size

Fig. 3. Voxel self-attention network architecture

Download full size

Fig. 4. VA-SECOND

Download full size

Fig. 5. VA-PVRCNN

Download full size

Fig. 6. Visualization of long-range scene detection results. (a) Results of PV-RCNN; (b) results of VA-PVRCNN; (c) scene image

Download full size

Fig. 7. Visualization of detection results in complex scenes. (a) Result of the PV-RCNN; (b) result of the VA-PVRCNN; (c) real scene

Download full size

Table 1. Detection level classification criteria in KITTI dataset
View table
Table 1. Detection level classification criteria in KITTI dataset
Level Minimum bounding box height Occlusion situation Minimum truncation /%
Easy 40 Fully visible 15
Mod 25 Partial occlusion 30
Hard 25 Severe occlusion 50

Table 2. Comparative experimental results of various algorithms on KITTI dataset

View table

Table 2. Comparative experimental results of various algorithms on KITTI dataset

Method	Car-3D （AP） /%				Ped-3D （AP） /%				Cyc-3D （AP） /%				Average /%
Method	Easy	Mod	Hard	mAP	Easy	Mod	Hard	mAP	Easy	Mod	Hard	mAP	Average /%
Voxel Net	87.93	75.37	73.21	78.84	67.81	63.52	58.87	63.40	77.69	58.72	51.63	62.68	68.30
Pointpillars	87.50	77.01	74.77	79.76	66.73	61.06	56.50	61.43	83.65	63.40	59.71	68.92	70.03
PointRCNN	89.01	78.77	78.10	81.96	62.69	55.36	51.60	56.55	84.48	65.37	59.83	69.89	69.46
Point-GNN	89.33	79.47	78.29	82.36	61.92	53.77	50.14	55.28	86.60	67.48	62.58	72.22	69.95
Part-A²	89.56	79.41	78.84	82.60	65.69	60.05	55.45	60.57	85.50	68.90	64.53	72.98	72.05
CT3D	92.85	85.82	83.46	87.37	65.73	58.56	53.04	59.11	91.99	71.60	67.34	76.97	74.48
SECOND	88.61	78.62	77.22	81.48	56.00	50.02	43.64	49.89	80.97	63.43	56.67	67.02	66.13
VA-SECOND	89.10	80.86	77.91	82.63	56.82	50.63	46.56	51.34	82.36	62.66	58.71	67.91	67.30
*	+0.49	+2.24	+0.69	+1.15	+0.82	+0.61	+2.92	+1.45	+1.39	-0.77	+2.04	+0.89	+1.16
PV-RCNN	92.57	84.83	82.69	86.69	64.26	56.67	51.91	57.61	88.88	71.95	66.78	75.87	72.49
VA-PVRCNN	92.12	85.07	82.61	86.60	67.85	60.08	55.47	61.14	92.03	71.74	67.34	77.03	74.93
*	-0.45	+0.24	-0.08	-0.09	+3.59	+3.41	+3.56	+3.53	+3.15	-0.21	+0.56	+1.16	+1.54

Table 3. Experimental results with dropout layers
View table
Table 3. Experimental results with dropout layers
Method Dropout Car-3D （mAP） /% Ped-3D （mAP） /% Cyc-3D （mAP） /%
VA-SECOND 0 82.63 51.34 67.91
0.1 82.21 50.96 67.81
0.3 81.60 49.92 67.16
VA-PVRCNN 0 86.60 61.14 77.03
0.1 86.47 61.03 76.95
0.3 85.95 60.17 76.08

Table 4. Experimental results with numbers of voxel in self-attention computation
View table
Table 4. Experimental results with numbers of voxel in self-attention computation
Method Number Car-3D （mAP） /% Ped-3D （mAP） /% Cyc-3D （mAP） /%
VA-SECOND 24 82.01 50.65 67.32
48 82.63 51.34 67.91
VA-PVRCNN 24 86.39 59.26 76.67
48 86.60 61.14 77.03

Table 5. Experimental results with projection layer
View table
Table 5. Experimental results with projection layer
Method Projection layer Car-3D （mAP） /% Ped-3D （mAP） /% Cyc-3D （mAP） /%
VA-SECOND × 82.21 51.10 67.68
√ 82.63 51.34 67.91
VA-PVRCNN × 86.49 60.95 76.92
√ 86.60 61.14 77.03

Table 6. Algorithmic reasoning time comparison
View table
Table 6. Algorithmic reasoning time comparison
Method SECOND VA-SECOND PV-RCNN VA-PVRCNN
Runtime /s 0.01 0.03 0.06 0.09

Tools

Get Citation

Copy Citation Text

Jie Cao, Yiqiang Peng, Likang Fan, Longfei Wang. 3D Object Detection Based on Voxel Self-Attention Auxiliary Networks[J]. Laser & Optoelectronics Progress, 2024, 61(24): 2415004

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites