Optics and Precision Engineering, Volume. 31, Issue 18, 2736(2023)
Weakly supervised video instance segmentation with scale adaptive generation regulation
Video instance segmentation is critical in multi-target perception and scene understanding in assisted driving. However, as weakly supervised video instance segmentation is often applied to bounding box annotations for network training, the segmentation accuracies of targets with large-scale dynamic ranges in traffic scenes are severely restricted. To address this issue, we propose a scale adaptive generation regulation weakly supervised video instance segmentation network (SAGRNet). First, a multi-scale feature mapping contribution dynamic adaptive control module is proposed to replace the original linear weighting. This enables placing the focus on the local position and global contour of the target by dynamically adjusting the contribution of different scale feature mapping information, which solves the problem of large-scale dynamic ranges caused by changes in the imaging distance between vehicles and pedestrians. Second, a target instance multi-fine-grained spatial information aggregation generation control module is constructed to regulate the feature maps of each scale using weight parameters, which are obtained by aggregating multi-fine-grained spatial information extracted based on different dilations. This module refines the instance boundary and improves the representation of cross-channel mask interaction information, effectively compensating for the lack of edge contour segmentation mask continuity caused by limited instance edge information. Finally, to alleviate the weak supervision derived from bounding box level annotations, orthogonal and color similarity losses are introduced to reduce the deviation between the model prediction mask and real bounding box and to address the pixel-wise label attribute classification ambiguity problem. Experimental results on a traffic scene dataset extracted from Youtube-VIS2019 indicate that the SAGRNet improves the mean accuracy by 5.1% to 38.1% compared with the weakly supervised baseline. These results prove that our method provides an effective theoretical basis for multi-target perception and instance level scene understanding.
Get Citation
Copy Citation Text
Yinhui ZHANG, Weiqi HAI, Zifen HE, Ying HUANG, Dongdong CHEN. Weakly supervised video instance segmentation with scale adaptive generation regulation[J]. Optics and Precision Engineering, 2023, 31(18): 2736
Category: Information Sciences
Received: Dec. 14, 2022
Accepted: --
Published Online: Oct. 12, 2023
The Author Email: HE Zifen (zyhhzf1998@163.com)