Opto-Electronic Engineering, Volume. 49, Issue 7, 210429(2022)
Interactive instance proposal network for HOI detection
[1] [1] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 91–99.
[2] [2] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440–1448.
[3] [3] Yang C Y, Xu Y H, Shi J P, et al. Temporal pyramid network for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 588–597.
[4] [4] Li M S, Chen S H, Chen X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3590–3598.
[5] [5] Kirillov A, He K M, Girshick R, et al. Panoptic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9396–9405.
[6] [6] Sofiiuk K, Sofiyuk K, Barinova O, et al. AdaptIS: adaptive instance selection network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 7354–7362.
[7] [7] Gao C, Xu J R, Zou Y L, et al. DRG: dual relation graph for human-object interaction detection[C]//16th European Conference on Computer Vision, 2020: 696–712.
[8] [8] Gao C, Zou Y L, Huang J B. iCAN: instance-centric attention network for human-object interaction detection[C]//British Machine Vision Conference 2018, 2018.
[9] [9] Chao Y W, Liu Y F, Liu X Y, et al. Learning to detect human-object interactions[C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: 381–389.
[10] [10] Hou Z, Peng X J, Qiao Y, et al. Visual compositional learning for human-object interaction detection[C]//16th European Conference on Computer Vision, 2020: 584–600.
[11] [11] Zhou P H, Chi M M. Relation parsing neural network for human-object interaction detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 843–851.
[12] [12] Kim B, Lee J, Kang J, et al. HOTR: end-to-end human-object interaction detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 74–83.
[13] [13] Zhang A X, Liao Y, Liu S, et al. Mining the benefits of two-stage and one-stage HOI detection[C]//Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
[14] [14] Zou C, Wang B H, Hu Y, et al. End-to-end human object interaction detection with HOI transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 11820–11829.
[15] [15] Chen M F, Liao Y, Liu S, et al. Reformulating HOI detection as adaptive set prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9000–9009.
[16] [16] Kamath A, Clark C, Gupta T, et al. Webly supervised concept expansion for general purpose vision models[Z]. arXiv: 2202.02317, 2022. https://arxiv.org/abs/2202.02317v1.
[17] [17] Li Z M, Zou C, Zhao Y, et al. Improving human-object interaction detection via phrase learning and label composition[Z]. arXiv: 2112.07383, 2021. https://doi.org/10.48550/arXiv.2112.07383.
[19] [19] Li Y L, Zhou S Y, Huang X J, et al. Transferable interactiveness knowledge for human-object interaction detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3580–3589.
[20] [20] Yang J W, Lu J S, Lee S, et al. Graph R-CNN for scene graph generation[C]//Proceedings of the 15th European Conference on Computer Vision (ECCV), 2018: 690–706.
[21] [21] Chen T S, Yu W H, Chen R Q, et al. Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 6156–6164.
[23] [23] Liang W X, Jiang Y H, Liu Z X. GraghVQA: language-guided graph neural networks for graph-based visual question answering[Z]. arXiv: 2104.10283, 2021. https://arxiv.org/abs/2104.10283v2.
[24] [24] Qi S Y, Wang W G, Jia B X, et al. Learning human-object interactions by graph parsing neural networks[C]//Proceedings of the 15th European Conference on Computer Vision (ECCV), 2018: 407–423.
[25] [25] Xu B J, Wong Y K, Li J N, et al. Learning to detect human-object interactions with knowledge[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2019–2028.
[26] [26] Zheng S P, Chen S Z, Jin Q. Skeleton-based interactive graph network for human object interaction detection[C]//2020 IEEE International Conference on Multimedia and Expo (ICME), 2020: 1–6.
[27] [27] Shen L Y, Yeung S, Hoffman J, et al. Scaling human-object interaction recognition through zero-shot learning[C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: 1568–1576.
[28] [28] Wang S C, Yap K H, Yuan J S, et al. Discovering human interactions with novel objects via zero-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11649–11658.
[29] [29] Fang H S, Xie Y C, Shao D, et al. DecAug: augmenting HOI detection via decomposition[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 1300–1308.
[30] [30] Sarullo A, Mu T T. Zero-shot human-object interaction recognition via affordance graphs[Z]. arXiv: 2009.01039, 2020. https://doi.org/10.48550/arXiv.2009.01039.
[31] [31] Wan B, Zhou D S, Liu Y F, et al. Pose-aware multi-level feature network for human object interaction detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9468–9477.
[32] [32] Peyre J, Sivic J, Laptev I, et al. Detecting unseen visual relations using analogies[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1981–1990.
[33] [33] Liu Y, Chen Q C, Zisserman A. Amplifying key cues for human-object-interaction detection[C]//16th European Conference on Computer Vision, 2020: 248–265.
[34] [34] Zhang F Z, Campbell D, Gould S. Spatio-attentive graphs for human-object interaction detection[Z]. arXiv: 2012.06060, 2020. https://arxiv.org/abs/2012.06060v1.
[35] [35] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944.
[36] [36] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
[37] [37] Chen L, Zhang H W, Xiao J, et al. Zero-shot visual recognition using semantics-preserving adversarial embedding networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1043–1052.
[38] [38] Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532–1543.
[39] [39] Gupta S, Malik J. Visual semantic role labeling[Z]. arXiv: 1505.04474, 2015. https://arxiv.org/abs/1505.04474v1.
[40] [40] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[C]//13th European Conference on Computer Vision, 2014: 740–755.
Get Citation
Copy Citation Text
Lixia Xue, Kaijian Yin, Ronggui Wang, Juan Yang. Interactive instance proposal network for HOI detection[J]. Opto-Electronic Engineering, 2022, 49(7): 210429
Category: Article
Received: Jan. 10, 2022
Accepted: --
Published Online: Aug. 1, 2022
The Author Email: Yang Juan (yangjuan6985@163.com)