Opto-Electronic Engineering, Volume. 49, Issue 7, 210429(2022)
Interactive instance proposal network for HOI detection
Overview: With the development of computer vision, people increasingly need to understand images, including recognizing the scenes and the human behaviors in images. The task of HOI detection is to locate humans and objects in images and infer their relationships. This requires not only locating a single object instance, but also identifying the interaction between the objects. However, machines cannot know which object humans are interacts in. Most of the existing methods solve this problem by completely pairing the people and objects. They use off-the-shelf object detectors to detect instances, but this does not meet the requirements of the HOI task. This paper proposes an object detector suitable for HOI detection based on relational reasoning, which makes use of the interactive relationship between humans and objects in the images to recommend human-object pairs, so as to reduce the occurrence of non-interactive human-object pairs as much as possible. Our method follows the two-stage detection like most works. Firstly, the interactive instance proposal network (IIPN) is used to recommend human-object pairs. The IIPN follows the pipeline of faster RCNN, but replaces the region proposal network (RPN) with the IIPN. The IIPN selects human-object pairs based on the interaction possibility between humans and objects using the visual information in the picture. It passes the message through the iterative reasoning of the graph neural networks (GNNS), only human-object pairs that include interactive relationships are selected as the IIPN’s outputs. Secondly, we design a cross-modal information fusion module (CIFM), which calculates the fusion attention according to the influence of different features on the detection results, and performs weighted fusion. This is because the existing methods simply add or splice several features such as human visual features, object visual features, and human-object spatial features in the reasoning part. The different influence degrees of various features in different actions are ignored. For example, the verbs like ride and hold in < human, ride bike> and < human, hold, bike > depend more on the spatial relationships, while eat and cut in
Get Citation
Copy Citation Text
Lixia Xue, Kaijian Yin, Ronggui Wang, Juan Yang. Interactive instance proposal network for HOI detection[J]. Opto-Electronic Engineering, 2022, 49(7): 210429
Category: Article
Received: Jan. 10, 2022
Accepted: --
Published Online: Aug. 1, 2022
The Author Email: Yang Juan (yangjuan6985@163.com)