Automated Parts-Based Model for Recognizing Human–Object Interactions from Aerial Imagery with Fully Convolutional Network

Advanced aerial images have led to the development of improved human–object interaction recognition (HOI) methods for usage in surveillance, security, and public monitoring systems. Despite the ever-increasing rate of research being conducted in the field of HOI, the existing challenges of occlusion...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Remote sensing (Basel, Switzerland) Switzerland), 2022-03, Vol.14 (6), p.1492
Hauptverfasser:	Ghadi, Yazeed, Waheed, Manahil, al Shloul, Tamara, A. Alsuhibany, Suliman, Jalal, Ahmad, Park, Jeongmin
Format:	Artikel
Sprache:	eng
Schlagworte:	aerial imagery Algorithms Automation Body parts Cameras Datasets Drones Embedding Feature extraction fully convolutional network Human body human–object interaction classification Image processing Image segmentation Noise Object recognition Occlusion parts-based model Remote sensing Stochasticity Surveillance System effectiveness Unmanned aerial vehicles Video
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Advanced aerial images have led to the development of improved human–object interaction recognition (HOI) methods for usage in surveillance, security, and public monitoring systems. Despite the ever-increasing rate of research being conducted in the field of HOI, the existing challenges of occlusion, scale variation, fast motion, and illumination variation continue to attract more researchers. In particular, accurate identification of human body parts, the involved objects, and robust features is the key to effective HOI recognition systems. However, identifying different human body parts and extracting their features is a tedious and rather ineffective task. Based on the assumption that only a few body parts are usually involved in a particular interaction, this article proposes a novel parts-based model for recognizing complex human–object interactions in videos and images captured using ground and aerial cameras. Gamma correction and non-local means denoising techniques have been used for pre-processing the video frames and Felzenszwalb’s algorithm has been utilized for image segmentation. After segmentation, twelve human body parts have been detected and five of them have been shortlisted based on their involvement in the interactions. Four kinds of features have been extracted and concatenated into a large feature vector, which has been optimized using the t-distributed stochastic neighbor embedding (t-SNE) technique. Finally, the interactions have been classified using a fully convolutional network (FCN). The proposed system has been validated on the ground and aerial videos of the VIRAT Video, YouTube Aerial, and SYSU 3D HOI datasets, achieving average accuracies of 82.55%, 86.63%, and 91.68% on these datasets, respectively.
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs14061492