Automatically detecting human-object interaction by an instance part-level attention deep framework
•One significant problem in HOI detection is that similar HOIs are difficult to distinguish. We find that the fine-grained part-level image context plays a crucial role to address the problem.•We propose a part-level visual pattern estimation method to define and estimate human body parts and object...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2023-02, Vol.134, p.109110, Article 109110 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •One significant problem in HOI detection is that similar HOIs are difficult to distinguish. We find that the fine-grained part-level image context plays a crucial role to address the problem.•We propose a part-level visual pattern estimation method to define and estimate human body parts and object parts.•We propose a self-attention-based deep network to learn the fine-grained image context that encodes the consistent relationships between human body parts and object parts, which is effective for better HOI detection.
Automatically detecting human-object interactions (HOIs) from an image is a very important but challenging task in computer vision. One of the significant problems in HOI detection is that similar human-object interactions are difficult to distinguish. Recently, many instance-centric HOI detection schemes, based on appearance features and coarse spatial information, have been proposed. These methods, however, lack the capacity of capturing and analyzing the fine-grained context between human poses and object parts, which plays a crucial role in HOI detection. To address these problems, we propose a novel instance part-level attention deep framework for HOI detection. Specifically, our approach consists of a human/object-part detection phase and an HOI detection phase. In the former phase, a part-level visual pattern estimation model is designed for capturing the fine-grained human body parts and object parts. In the latter phase, a self-attention-based deep network is proposed to learn the visual composite around the human-object pair that implicitly expresses the consistent spatial, scale, co-occurrence, and viewpoint relationships among human body parts and object parts across images, which are effective for predicting HOI. To the best of our knowledge, we are the first to propose a framework where the fine-grained part-level mutual context of a human-object pair is extracted to improve HOI detection. By comparing our approach with state-of-the-art HOI detection methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing HOI detection methods, such as significantly improving the performance of part-level visual pattern estimation, HOI detection, and the quality of the self-attention-based deep network structure. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.109110 |