Adaptive Multimodal-Feature Fusion for 6D Object Position Estimation

Fig. 6. Experimental results of surface normal generation from depth information. (a) Original image; (b) normal image; (c) normal image of the target object

Download full size

Table 1. Test results on LineMOD dataset

View table

Table 1. Test results on LineMOD dataset

Object	RGB based method				RGB-D based method
Object	PoseCNN	PVNet	HybridPose	TexPose	DenseFusion	MaskedFusion	Uni6D	QaQ	Proposed method
mean	88.6	86.3	94.5	91.7	94.3	97.3	97.0	95.3	97.4
ape	77.0	43.6	77.6	80.9	92.3	92.2	93.7	90.3	96.1
bench yi.	97.5	99.9	99.6	99.0	93.2	98.4	99.8	94.3	97.6
camera	93.5	86.9	95.9	94.8	94.4	98.0	96.0	96.8	97.3
can	96.5	95.5	93.6	99.7	93.1	97.4	99.0	95.6	96.9
cat	82.1	79.3	93.5	92.6	96.5	97.8	98.1	95.8	97.7
driller	95.0	96.4	97.2	97.4	87.0	95.6	99.1	90.0	94.5
duck	77.7	52.6	87.0	83.4	92.3	94.0	99.0	92.1	95.8
eggbox	97.1	99.2	99.6	94.9	99.8	99.6	100.0	100.0	99.8
glue	99.4	95.7	98.7	93.4	100.0	100.0	99.2	100.0	99.9
hole p.	52.8	82.0	92.5	79.3	92.1	97.3	90.2	92.8	96.5
iron	98.3	98.9	98.1	99.8	97.0	97.1	99.5	98.1	99.0
lamp	97.5	99.3	96.9	98.3	95.3	99.0	99.4	96.9	97.8
phone	87.7	92.4	98.3	78.9	92.8	98.8	97.4	96.5	97.4

Table 2. Test results on YCB-Video dataset

View table

Table 2. Test results on YCB-Video dataset

Object	DenseFusion		Dual-Stream		PR-GCN		Uni6D		Proposed method
Object	AUC	<2 cm /%	AUC	<2 cm /%	AUC	<2 cm /%	AUC	<2 cm /%	AUC	<2 cm /%
mean	91.6	96.2	92.2	96.5	95.0	97.7	94.5	97.5	95.8	98.3
master chef can	96.0	100.0	93.3	99.8	95.6	99.4	94.8	98.7	96.3	100.0
cracker box	93.1	99.2	96.0	99.4	97.0	100.0	91.3	98.4	97.3	100.0
sugar box	96.8	99.6	97.4	99.9	97.8	98.1	95.9	99.1	98.1	99.8
tomato soup can	93.3	95.7	93.7	96.5	94.6	97.2	94.1	96.4	95.8	97.4
mustard bottle	97.0	99.8	96.1	100.0	98.0	99.6	95.2	99.0	98.6	100.0
tuna fish can	95.9	100.0	95.8	99.5	96.4	99.8	94.7	98.8	97.2	100.0
pudding box	94.8	99.4	96.2	99.7	97.8	99.2	94.3	99.2	97.9	99.8
gelatin box	98.0	100.0	95.6	95.2	96.4	95.3	97.1	99.8	98.2	100.0
potted meat can	88.7	92.1	89.3	91.4	95.1	97.9	92.8	94.0	95.4	94.9
banana	94.3	98.9	96.9	100.0	97.1	99.9	96.0	99.4	98.0	100.0
pitcher base	96.9	99.3	96.3	98.9	97.6	100.0	96.3	99.5	98.3	100.0
bleach cleanser	94.2	98.9	94.0	99.9	96.3	99.1	94.7	97.6	96.8	99.6
bowl	85.1	97.6	85.1	99.3	90.5	96.8	94.3	96.9	94.1	100.0
mug	97.2	99.9	97.1	98.5	97.5	99.3	96.5	99.7	97.9	99.8
power drill	94.0	97.2	94.5	98.1	97.0	99.0	93.8	95.3	97.4	98.5
wood block	87.8	93.0	90.9	97.7	94.8	97.7	94.4	97.2	95.3	97.9
scissors	93.7	98.7	93.8	99.0	94.9	98.5	86.9	84.6	94.2	100.0
large marker	97.2	100.0	96.8	99.8	96.8	99.4	96.4	99.8	97.0	100.0
large clamp	71.6	78.1	72.0	77.2	87.2	92.6	94.1	98.3	88.7	90.1
extra large clamp	66.0	74.7	73.2	76.9	79.1	83.3	93.9	97.5	83.4	87.4
foam brick	92.2	97.5	91.8	98.9	96.0	98.9	96.0	99.2	96.5	100.0

Table 3. Comparative analysis of model ablation experiments
View table
Table 3. Comparative analysis of model ablation experiments
RGB-D Improve feature extraction Normal enhancement module Feature fusion module Baseline model ADD /%
√ √ 88.8
√ √ √ 93.4
√ √ √ √ 95.9
√ √ √ √ √ 97.4

Tools

Get Citation

Copy Citation Text

Chuanfang Zang, Jianwu Dang, Jiu Yong. Adaptive Multimodal-Feature Fusion for 6D Object Position Estimation[J]. Laser & Optoelectronics Progress, 2025, 62(4): 0415002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites