Automatically detecting human-object interaction by an instance part-level attention deep framework

•One significant problem in HOI detection is that similar HOIs are difficult to distinguish. We find that the fine-grained part-level image context plays a crucial role to address the problem.•We propose a part-level visual pattern estimation method to define and estimate human body parts and object...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2023-02, Vol.134, p.109110, Article 109110
Hauptverfasser:	Bai, Lin, Chen, Fenglian, Tian, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Human-object interaction Image context Instance part-level correlations Self-attention-based model
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•One significant problem in HOI detection is that similar HOIs are difficult to distinguish. We find that the fine-grained part-level image context plays a crucial role to address the problem.•We propose a part-level visual pattern estimation method to define and estimate human body parts and object parts.•We propose a self-attention-based deep network to learn the fine-grained image context that encodes the consistent relationships between human body parts and object parts, which is effective for better HOI detection. Automatically detecting human-object interactions (HOIs) from an image is a very important but challenging task in computer vision. One of the significant problems in HOI detection is that similar human-object interactions are difficult to distinguish. Recently, many instance-centric HOI detection schemes, based on appearance features and coarse spatial information, have been proposed. These methods, however, lack the capacity of capturing and analyzing the fine-grained context between human poses and object parts, which plays a crucial role in HOI detection. To address these problems, we propose a novel instance part-level attention deep framework for HOI detection. Specifically, our approach consists of a human/object-part detection phase and an HOI detection phase. In the former phase, a part-level visual pattern estimation model is designed for capturing the fine-grained human body parts and object parts. In the latter phase, a self-attention-based deep network is proposed to learn the visual composite around the human-object pair that implicitly expresses the consistent spatial, scale, co-occurrence, and viewpoint relationships among human body parts and object parts across images, which are effective for predicting HOI. To the best of our knowledge, we are the first to propose a framework where the fine-grained part-level mutual context of a human-object pair is extracted to improve HOI detection. By comparing our approach with state-of-the-art HOI detection methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing HOI detection methods, such as significantly improving the performance of part-level visual pattern estimation, HOI detection, and the quality of the self-attention-based deep network structure.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2022.109110