Mask generation dynamically regulates weakly supervised video instance segmentation

Table 1. Training parameter settings
View table
View in Article
Table 1. Training parameter settings
超参数及学习策略值
图像输入尺寸 360×640
Batch Size 4
动量 0.9
迭代步数 Boxset：23 000，YT-VIS：190 000
学习率 0.005

Table 2. Comparison of experiment results for different video instance segmentation networks

View table

View in Article

Table 2. Comparison of experiment results for different video instance segmentation networks

Methods	BoxSet					YT-VIS^［1］
Methods	AP	AP₅₀	AP₇₅	AR₁	AR₁₀	AP	AP₅₀	AP₇₅	AR₁	AR₁₀
Sipmask^［21］	36.2	61.8	34.2	38.5	41.4	32.5	53.0	33.3	33.5	38.9
STMask^［22］	37.5	61.0	38.6	39.3	42.9	33.5	52.1	36.9	31.1	39.2
CrossVIS^［23］	39.9	64.1	41.5	41.0	46.7	34.8	54.6	37.9	34.0	39.0
FlowIRN^［12］	—	—	—	—	—	10.5	27.2	6.2	12.3	13.6
FlowSimi^［14］	—	—	—	—	—	29.0	50.2	29.4	—	—
WSVIS	37.5	64.7	40.0	37.8	42.5	30.1	50.5	31.2	31.1	37.0

Table 3. 和的有效性验证
View table
View in Article
Table 3. 和的有效性验证
$L_{b m}$ $L_{p a i r}$ AP AP₅₀ AP₇₅ AR₁ AR₁₀
√ — 23.7 55.4 20.7 25.6 27.9
— √ 22.5 54.1 18.7 24.3 26.7
√ √ 32.9 60.3 32.8 35.6 39.2

Table 4. Effectiveness verification of MFF module
View table
View in Article
Table 4. Effectiveness verification of MFF module
MFF AP AP₅₀ AP₇₅ AR₁ AR₁₀
— 32.9 60.3 32.8 35.6 39.2
√ 36.3 64.7 35.1 37.7 42.4

Table 5. MFF模块不同级、层不同的卷积实验结果

View table

View in Article

Table 5. MFF模块不同级、层不同的卷积实验结果

卷积层级	AP	AP₅₀	AP₇₅	AR₁	AR₁₀
Conv 3	32.6	59.8	32.5	34.4	39.4
Conv 4	33.7	62.5	36.2	35.6	39.2
Conv 5	34.8	63.4	36.8	36.0	40.1
CondConv^［27］ 3	34.3	62.5	33.8	37.5	40.7
CondConv^［27］ 4	35.6	69.5	37.1	37.0	41.0
CondConv^［27］ 5	36.3	64.7	35.1	37.7	42.4

Table 6. Effectiveness verification of dynamic regulation mechanism
View table
View in Article
Table 6. Effectiveness verification of dynamic regulation mechanism
动态调控机制 AP AP₅₀ AP₇₅ AR₁ AR₁₀
— 36.3 64.7 35.1 37.7 42.4
√ 37.5 64.7 40.0 37.8 42.5

Table 7. Results of cross frame interval comparison experiment
View table
View in Article
Table 7. Results of cross frame interval comparison experiment
Cross 1 5 10 15 20 25 30 35
— 34.3 34.6 34.9 34.3 34.6 34.6 34.0 34.1
√ 35.1 35.4 36.2 36.3 37.5 36.2 36.2 35.8

Table 8. Comparison of model complexity and inference speed of different networks
View table
View in Article
Table 8. Comparison of model complexity and inference speed of different networks
Network Size Param（M） FLOPs（G） FPS
Sipmask^［21］ 384×640 32.75 226.62 30.0
STMask^［22］ 384×640 36.79 521.46 28.6
CrossVIS^［23］ 360×640 8.37 153.33 39.8
WSVIS 360×640 8.59 159.94 34.4

Tools

Get Citation

Copy Citation Text

Zifen HE, Lin XU, Yinhui ZHANG, Ying HUANG. Mask generation dynamically regulates weakly supervised video instance segmentation[J]. Optics and Precision Engineering, 2023, 31(19): 2884

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites