Multiple-input streams attention (MISA) network for skeleton-based construction workers' action recognition using body-segment representation strategies

With the rapid growth of deep learning algorithms, graph convolutional networks (GCNs) have become a common choice for skeleton-based human action recognition, boasting impressive performance. However, existing GCN-based models often rely on physical human body connections, which may not suit comple...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Automation in construction 2023-12, Vol.156, p.105104, Article 105104
Hauptverfasser: Tian, Yuanyuan, Chen, Jiayu, Kim, Jung In, Kwac, Jungsuk
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid growth of deep learning algorithms, graph convolutional networks (GCNs) have become a common choice for skeleton-based human action recognition, boasting impressive performance. However, existing GCN-based models often rely on physical human body connections, which may not suit complex construction tasks involving various body parts and hand movements. To address this concern, the human body is modeled in this paper through topological graphs at varying levels, designed based on body-segment strategies. A multiple-input streams attention (MISA) network is introduced, incorporating GCN and temporal convolutional network (TCN) components to enhance the body-structure topology graph of GCNs with more comprehensive input graphs. Additionally, two-modality motion data and three attention blocks are integrated to capture more discerning features. Finally, experimental results using the Construction Motion Library (CML) dataset demonstrated the superiority of the developed method, reaching approximately 84.94% recognition accuracy. •A multiple-input streams attention (MISA) network is proposed for recognizing construction workers' action.•GCN and TCN models are incorporated into the network to abstract the spatial-temporal features of the skeleton.•Different skeleton presentation topologies are developed to capture the characteristics of construction workers' actions.•A five-part level topology graph and faster mode motion data enhanced the recognition of the diverse construction tasks.•Two-stage fusion with attention blocks plays a positive effect in improving the recognition accuracy.
ISSN:0926-5805
1872-7891
DOI:10.1016/j.autcon.2023.105104