ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition
This paper presents the ARN-LSTM architecture, a novel multi-stream action recognition model designed to address the challenge of simultaneously capturing spatial motion and temporal dynamics in action sequences. Traditional methods often focus solely on spatial or temporal features, limiting their...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wang, Chuanchuan Mohmamed, Ahmad Sufril Azlan Noor, Mohd Halim Bin Mohd Yang, Xiao Yi, Feifan Li, Xiang |
description | This paper presents the ARN-LSTM architecture, a novel multi-stream action
recognition model designed to address the challenge of simultaneously capturing
spatial motion and temporal dynamics in action sequences. Traditional methods
often focus solely on spatial or temporal features, limiting their ability to
comprehend complex human activities fully. Our proposed model integrates joint,
motion, and temporal information through a multi-stream fusion architecture.
Specifically, it comprises a jointstream for extracting skeleton features, a
temporal stream for capturing dynamic temporal features, and an ARN-LSTM block
that utilizes Time-Distributed Long Short-Term Memory (TD-LSTM) layers followed
by an Attention Relation Network (ARN) to model temporal relations. The outputs
from these streams are fused in a fully connected layer to provide the final
action prediction. Evaluations on the NTU RGB+D 60 and NTU RGB+D 120 datasets
outperform the superior performance of our model, particularly in group
activity recognition. |
doi_str_mv | 10.48550/arxiv.2411.01769 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_01769</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_01769</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_017693</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwNDez5GTwdgzy0_UJDvG1UnBU8C3NKcnUDS4pSk3MVXArLc7Mz1PwzU9JzVFIyy9SCM5OzUktyc_TTUosTk1RcEwuAckHpSbnp-dlgtg8DKxpiTnFqbxQmptB3s01xNlDF2xtfEFRZm5iUWU8yPp4sPXGhFUAAMebOUU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition</title><source>arXiv.org</source><creator>Wang, Chuanchuan ; Mohmamed, Ahmad Sufril Azlan ; Noor, Mohd Halim Bin Mohd ; Yang, Xiao ; Yi, Feifan ; Li, Xiang</creator><creatorcontrib>Wang, Chuanchuan ; Mohmamed, Ahmad Sufril Azlan ; Noor, Mohd Halim Bin Mohd ; Yang, Xiao ; Yi, Feifan ; Li, Xiang</creatorcontrib><description>This paper presents the ARN-LSTM architecture, a novel multi-stream action
recognition model designed to address the challenge of simultaneously capturing
spatial motion and temporal dynamics in action sequences. Traditional methods
often focus solely on spatial or temporal features, limiting their ability to
comprehend complex human activities fully. Our proposed model integrates joint,
motion, and temporal information through a multi-stream fusion architecture.
Specifically, it comprises a jointstream for extracting skeleton features, a
temporal stream for capturing dynamic temporal features, and an ARN-LSTM block
that utilizes Time-Distributed Long Short-Term Memory (TD-LSTM) layers followed
by an Attention Relation Network (ARN) to model temporal relations. The outputs
from these streams are fused in a fully connected layer to provide the final
action prediction. Evaluations on the NTU RGB+D 60 and NTU RGB+D 120 datasets
outperform the superior performance of our model, particularly in group
activity recognition.</description><identifier>DOI: 10.48550/arxiv.2411.01769</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.01769$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.01769$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Chuanchuan</creatorcontrib><creatorcontrib>Mohmamed, Ahmad Sufril Azlan</creatorcontrib><creatorcontrib>Noor, Mohd Halim Bin Mohd</creatorcontrib><creatorcontrib>Yang, Xiao</creatorcontrib><creatorcontrib>Yi, Feifan</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><title>ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition</title><description>This paper presents the ARN-LSTM architecture, a novel multi-stream action
recognition model designed to address the challenge of simultaneously capturing
spatial motion and temporal dynamics in action sequences. Traditional methods
often focus solely on spatial or temporal features, limiting their ability to
comprehend complex human activities fully. Our proposed model integrates joint,
motion, and temporal information through a multi-stream fusion architecture.
Specifically, it comprises a jointstream for extracting skeleton features, a
temporal stream for capturing dynamic temporal features, and an ARN-LSTM block
that utilizes Time-Distributed Long Short-Term Memory (TD-LSTM) layers followed
by an Attention Relation Network (ARN) to model temporal relations. The outputs
from these streams are fused in a fully connected layer to provide the final
action prediction. Evaluations on the NTU RGB+D 60 and NTU RGB+D 120 datasets
outperform the superior performance of our model, particularly in group
activity recognition.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwNDez5GTwdgzy0_UJDvG1UnBU8C3NKcnUDS4pSk3MVXArLc7Mz1PwzU9JzVFIyy9SCM5OzUktyc_TTUosTk1RcEwuAckHpSbnp-dlgtg8DKxpiTnFqbxQmptB3s01xNlDF2xtfEFRZm5iUWU8yPp4sPXGhFUAAMebOUU</recordid><startdate>20241103</startdate><enddate>20241103</enddate><creator>Wang, Chuanchuan</creator><creator>Mohmamed, Ahmad Sufril Azlan</creator><creator>Noor, Mohd Halim Bin Mohd</creator><creator>Yang, Xiao</creator><creator>Yi, Feifan</creator><creator>Li, Xiang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241103</creationdate><title>ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition</title><author>Wang, Chuanchuan ; Mohmamed, Ahmad Sufril Azlan ; Noor, Mohd Halim Bin Mohd ; Yang, Xiao ; Yi, Feifan ; Li, Xiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_017693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Chuanchuan</creatorcontrib><creatorcontrib>Mohmamed, Ahmad Sufril Azlan</creatorcontrib><creatorcontrib>Noor, Mohd Halim Bin Mohd</creatorcontrib><creatorcontrib>Yang, Xiao</creatorcontrib><creatorcontrib>Yi, Feifan</creatorcontrib><creatorcontrib>Li, Xiang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Chuanchuan</au><au>Mohmamed, Ahmad Sufril Azlan</au><au>Noor, Mohd Halim Bin Mohd</au><au>Yang, Xiao</au><au>Yi, Feifan</au><au>Li, Xiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition</atitle><date>2024-11-03</date><risdate>2024</risdate><abstract>This paper presents the ARN-LSTM architecture, a novel multi-stream action
recognition model designed to address the challenge of simultaneously capturing
spatial motion and temporal dynamics in action sequences. Traditional methods
often focus solely on spatial or temporal features, limiting their ability to
comprehend complex human activities fully. Our proposed model integrates joint,
motion, and temporal information through a multi-stream fusion architecture.
Specifically, it comprises a jointstream for extracting skeleton features, a
temporal stream for capturing dynamic temporal features, and an ARN-LSTM block
that utilizes Time-Distributed Long Short-Term Memory (TD-LSTM) layers followed
by an Attention Relation Network (ARN) to model temporal relations. The outputs
from these streams are fused in a fully connected layer to provide the final
action prediction. Evaluations on the NTU RGB+D 60 and NTU RGB+D 120 datasets
outperform the superior performance of our model, particularly in group
activity recognition.</abstract><doi>10.48550/arxiv.2411.01769</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2411.01769 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2411_01769 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T06%3A17%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ARN-LSTM:%20A%20Multi-Stream%20Fusion%20Model%20for%20Skeleton-based%20Action%20Recognition&rft.au=Wang,%20Chuanchuan&rft.date=2024-11-03&rft_id=info:doi/10.48550/arxiv.2411.01769&rft_dat=%3Carxiv_GOX%3E2411_01769%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |