Centaur: Robust Multimodal Fusion for Human Activity Recognition
The proliferation of Internet of Things (IoT) and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time series emitted by sensors with different modalities. While there are promising neural network architectures for multimodal fusion, their p...
Gespeichert in:
Veröffentlicht in: | IEEE sensors journal 2024, Vol.24 (11), p.18578-18591 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 18591 |
---|---|
container_issue | 11 |
container_start_page | 18578 |
container_title | IEEE sensors journal |
container_volume | 24 |
creator | Xaviar, Sanju Yang, Xin Ardakanian, Omid |
description | The proliferation of Internet of Things (IoT) and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time series emitted by sensors with different modalities. While there are promising neural network architectures for multimodal fusion, their performance falls apart quickly in the presence of consecutive missing data and noise across multiple modalities/sensors, the issues that are prevalent in real-world settings. We propose Centaur, a multimodal fusion model for human activity recognition (HAR) that is robust to these data quality issues. Centaur combines a data cleaning module, which is a denoising autoencoder (DAE) with convolutional layers, and a multimodal fusion module, which is a deep convolutional neural network with the self-attention (SA) mechanism to capture cross-sensor (CS) correlation. We train Centaur using a stochastic data corruption scheme and evaluate it on five datasets that contain data generated by multiple inertial measurement units (IMUs). We show that Centaur's data cleaning module outperforms two state-of-the-art autoencoder-based architectures, and its multimodal fusion module outperforms four strong baselines. Compared to two robust fusion architectures from the related work, Centaur is more robust especially to consecutive missing data that occur in multiple sensor channels, achieving 10.89%-16.56% higher accuracy in the HAR task. |
doi_str_mv | 10.1109/JSEN.2024.3388893 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10506095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10506095</ieee_id><sourcerecordid>3064706070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-52e97a1e1b8c967ffc9f549b63c303bc4b0f47488975532904fff3371c87b9593</originalsourceid><addsrcrecordid>eNpNkM1KAzEUhYMoWKsPILgYcD01mSSTxJWltFapClXBXZiJiaS0k5qfQt_eDO3C1b1wzzn38AFwjeAIISjunt-nr6MKVmSEMedc4BMwQJTyEjHCT_sdw5Jg9nUOLkJYQYgEo2wAHia6i03y98XStSnE4iWto92472ZdzFKwriuM88U8bZquGKtodzbui6VW7qezMZ8vwZlp1kFfHecQfM6mH5N5uXh7fJqMF6WqSB1LWmnBGqRRy5WomTFKGEpEW2OFIW4VaaEhuSrPtSiuBCTGGIwZUpy1ggo8BLeH3K13v0mHKFcu-S6_lBjWhMEaMphV6KBS3oXgtZFbbzeN30sEZQ9K9qBkD0oeQWXPzcFjtdb_9DRHCor_ANlgY1k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064706070</pqid></control><display><type>article</type><title>Centaur: Robust Multimodal Fusion for Human Activity Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Xaviar, Sanju ; Yang, Xin ; Ardakanian, Omid</creator><creatorcontrib>Xaviar, Sanju ; Yang, Xin ; Ardakanian, Omid</creatorcontrib><description>The proliferation of Internet of Things (IoT) and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time series emitted by sensors with different modalities. While there are promising neural network architectures for multimodal fusion, their performance falls apart quickly in the presence of consecutive missing data and noise across multiple modalities/sensors, the issues that are prevalent in real-world settings. We propose Centaur, a multimodal fusion model for human activity recognition (HAR) that is robust to these data quality issues. Centaur combines a data cleaning module, which is a denoising autoencoder (DAE) with convolutional layers, and a multimodal fusion module, which is a deep convolutional neural network with the self-attention (SA) mechanism to capture cross-sensor (CS) correlation. We train Centaur using a stochastic data corruption scheme and evaluate it on five datasets that contain data generated by multiple inertial measurement units (IMUs). We show that Centaur's data cleaning module outperforms two state-of-the-art autoencoder-based architectures, and its multimodal fusion module outperforms four strong baselines. Compared to two robust fusion architectures from the related work, Centaur is more robust especially to consecutive missing data that occur in multiple sensor channels, achieving 10.89%-16.56% higher accuracy in the HAR task.</description><identifier>ISSN: 1530-437X</identifier><identifier>EISSN: 1558-1748</identifier><identifier>DOI: 10.1109/JSEN.2024.3388893</identifier><identifier>CODEN: ISJEAZ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Brain modeling ; Cleaning ; Data models ; Human activity recognition ; Human activity recognition (HAR) ; Inertial platforms ; Internet of Things ; Missing data ; Modules ; multimodal fusion ; Neural networks ; Noise ; Robustness ; sensor faults ; Sensor fusion ; Sensors ; Wearable sensors</subject><ispartof>IEEE sensors journal, 2024, Vol.24 (11), p.18578-18591</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-6711-5502 ; 0009-0009-7475-567X ; 0000-0002-7861-0007</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10506095$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10506095$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xaviar, Sanju</creatorcontrib><creatorcontrib>Yang, Xin</creatorcontrib><creatorcontrib>Ardakanian, Omid</creatorcontrib><title>Centaur: Robust Multimodal Fusion for Human Activity Recognition</title><title>IEEE sensors journal</title><addtitle>JSEN</addtitle><description>The proliferation of Internet of Things (IoT) and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time series emitted by sensors with different modalities. While there are promising neural network architectures for multimodal fusion, their performance falls apart quickly in the presence of consecutive missing data and noise across multiple modalities/sensors, the issues that are prevalent in real-world settings. We propose Centaur, a multimodal fusion model for human activity recognition (HAR) that is robust to these data quality issues. Centaur combines a data cleaning module, which is a denoising autoencoder (DAE) with convolutional layers, and a multimodal fusion module, which is a deep convolutional neural network with the self-attention (SA) mechanism to capture cross-sensor (CS) correlation. We train Centaur using a stochastic data corruption scheme and evaluate it on five datasets that contain data generated by multiple inertial measurement units (IMUs). We show that Centaur's data cleaning module outperforms two state-of-the-art autoencoder-based architectures, and its multimodal fusion module outperforms four strong baselines. Compared to two robust fusion architectures from the related work, Centaur is more robust especially to consecutive missing data that occur in multiple sensor channels, achieving 10.89%-16.56% higher accuracy in the HAR task.</description><subject>Artificial neural networks</subject><subject>Brain modeling</subject><subject>Cleaning</subject><subject>Data models</subject><subject>Human activity recognition</subject><subject>Human activity recognition (HAR)</subject><subject>Inertial platforms</subject><subject>Internet of Things</subject><subject>Missing data</subject><subject>Modules</subject><subject>multimodal fusion</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Robustness</subject><subject>sensor faults</subject><subject>Sensor fusion</subject><subject>Sensors</subject><subject>Wearable sensors</subject><issn>1530-437X</issn><issn>1558-1748</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1KAzEUhYMoWKsPILgYcD01mSSTxJWltFapClXBXZiJiaS0k5qfQt_eDO3C1b1wzzn38AFwjeAIISjunt-nr6MKVmSEMedc4BMwQJTyEjHCT_sdw5Jg9nUOLkJYQYgEo2wAHia6i03y98XStSnE4iWto92472ZdzFKwriuM88U8bZquGKtodzbui6VW7qezMZ8vwZlp1kFfHecQfM6mH5N5uXh7fJqMF6WqSB1LWmnBGqRRy5WomTFKGEpEW2OFIW4VaaEhuSrPtSiuBCTGGIwZUpy1ggo8BLeH3K13v0mHKFcu-S6_lBjWhMEaMphV6KBS3oXgtZFbbzeN30sEZQ9K9qBkD0oeQWXPzcFjtdb_9DRHCor_ANlgY1k</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Xaviar, Sanju</creator><creator>Yang, Xin</creator><creator>Ardakanian, Omid</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-6711-5502</orcidid><orcidid>https://orcid.org/0009-0009-7475-567X</orcidid><orcidid>https://orcid.org/0000-0002-7861-0007</orcidid></search><sort><creationdate>2024</creationdate><title>Centaur: Robust Multimodal Fusion for Human Activity Recognition</title><author>Xaviar, Sanju ; Yang, Xin ; Ardakanian, Omid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-52e97a1e1b8c967ffc9f549b63c303bc4b0f47488975532904fff3371c87b9593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Brain modeling</topic><topic>Cleaning</topic><topic>Data models</topic><topic>Human activity recognition</topic><topic>Human activity recognition (HAR)</topic><topic>Inertial platforms</topic><topic>Internet of Things</topic><topic>Missing data</topic><topic>Modules</topic><topic>multimodal fusion</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Robustness</topic><topic>sensor faults</topic><topic>Sensor fusion</topic><topic>Sensors</topic><topic>Wearable sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xaviar, Sanju</creatorcontrib><creatorcontrib>Yang, Xin</creatorcontrib><creatorcontrib>Ardakanian, Omid</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE sensors journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xaviar, Sanju</au><au>Yang, Xin</au><au>Ardakanian, Omid</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Centaur: Robust Multimodal Fusion for Human Activity Recognition</atitle><jtitle>IEEE sensors journal</jtitle><stitle>JSEN</stitle><date>2024</date><risdate>2024</risdate><volume>24</volume><issue>11</issue><spage>18578</spage><epage>18591</epage><pages>18578-18591</pages><issn>1530-437X</issn><eissn>1558-1748</eissn><coden>ISJEAZ</coden><abstract>The proliferation of Internet of Things (IoT) and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time series emitted by sensors with different modalities. While there are promising neural network architectures for multimodal fusion, their performance falls apart quickly in the presence of consecutive missing data and noise across multiple modalities/sensors, the issues that are prevalent in real-world settings. We propose Centaur, a multimodal fusion model for human activity recognition (HAR) that is robust to these data quality issues. Centaur combines a data cleaning module, which is a denoising autoencoder (DAE) with convolutional layers, and a multimodal fusion module, which is a deep convolutional neural network with the self-attention (SA) mechanism to capture cross-sensor (CS) correlation. We train Centaur using a stochastic data corruption scheme and evaluate it on five datasets that contain data generated by multiple inertial measurement units (IMUs). We show that Centaur's data cleaning module outperforms two state-of-the-art autoencoder-based architectures, and its multimodal fusion module outperforms four strong baselines. Compared to two robust fusion architectures from the related work, Centaur is more robust especially to consecutive missing data that occur in multiple sensor channels, achieving 10.89%-16.56% higher accuracy in the HAR task.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSEN.2024.3388893</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-6711-5502</orcidid><orcidid>https://orcid.org/0009-0009-7475-567X</orcidid><orcidid>https://orcid.org/0000-0002-7861-0007</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1530-437X |
ispartof | IEEE sensors journal, 2024, Vol.24 (11), p.18578-18591 |
issn | 1530-437X 1558-1748 |
language | eng |
recordid | cdi_ieee_primary_10506095 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial neural networks Brain modeling Cleaning Data models Human activity recognition Human activity recognition (HAR) Inertial platforms Internet of Things Missing data Modules multimodal fusion Neural networks Noise Robustness sensor faults Sensor fusion Sensors Wearable sensors |
title | Centaur: Robust Multimodal Fusion for Human Activity Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T14%3A14%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Centaur:%20Robust%20Multimodal%20Fusion%20for%20Human%20Activity%20Recognition&rft.jtitle=IEEE%20sensors%20journal&rft.au=Xaviar,%20Sanju&rft.date=2024&rft.volume=24&rft.issue=11&rft.spage=18578&rft.epage=18591&rft.pages=18578-18591&rft.issn=1530-437X&rft.eissn=1558-1748&rft.coden=ISJEAZ&rft_id=info:doi/10.1109/JSEN.2024.3388893&rft_dat=%3Cproquest_RIE%3E3064706070%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064706070&rft_id=info:pmid/&rft_ieee_id=10506095&rfr_iscdi=true |