PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System

Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sens...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2023-09, Vol.7 (3), p.1-24, Article 130
Hauptverfasser: Wang, Juexing, Wang, Guangjing, Zhang, Xiao, Liu, Li, Zeng, Huacheng, Xiao, Li, Cao, Zhichao, Gu, Lin, Li, Tianxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 24
container_issue 3
container_start_page 1
container_title Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies
container_volume 7
creator Wang, Juexing
Wang, Guangjing
Zhang, Xiao
Liu, Li
Zeng, Huacheng
Xiao, Li
Cao, Zhichao
Gu, Lin
Li, Tianxing
description Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.
doi_str_mv 10.1145/3610885
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3610885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3610885</sourcerecordid><originalsourceid>FETCH-LOGICAL-a206t-156165763dfa97604dab555c6be15ae9608c8604f2dae0d8172ed5ce41ae3c013</originalsourceid><addsrcrecordid>eNpNj82KAjEQhMOioLji3RfwNNo9SXcyR5FdFQbcg56HNsmAy8rKxItv74g_eKqi6qOglBohTBENzTQjOEcfqp8ba7KC2HbefE8NU_oFACy0dmD7qvsz3y5Wn6pby1-Kw4cO1O77q82zcrNcL-ZlJjnwOUNiZLKsQy2FZTBB9kTkeR-RJBYMzrs2rvMgEYJDm8dAPhqUqD2gHqjJfdc3_yk1sa5OzeEozaVCqG4HqseBlhzfSfHHF_Qsr4uHPCM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System</title><source>Access via ACM Digital Library</source><creator>Wang, Juexing ; Wang, Guangjing ; Zhang, Xiao ; Liu, Li ; Zeng, Huacheng ; Xiao, Li ; Cao, Zhichao ; Gu, Lin ; Li, Tianxing</creator><creatorcontrib>Wang, Juexing ; Wang, Guangjing ; Zhang, Xiao ; Liu, Li ; Zeng, Huacheng ; Xiao, Li ; Cao, Zhichao ; Gu, Lin ; Li, Tianxing</creatorcontrib><description>Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3610885</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Architectures ; Cloud computing ; Computer systems organization ; Computing methodologies ; Distributed architectures ; Human-centered computing ; Learning paradigms ; Machine learning ; Multi-task learning ; Ubiquitous and mobile computing ; Ubiquitous and mobile computing systems and tools</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2023-09, Vol.7 (3), p.1-24, Article 130</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a206t-156165763dfa97604dab555c6be15ae9608c8604f2dae0d8172ed5ce41ae3c013</cites><orcidid>0009-0006-4330-7098 ; 0000-0002-8159-9072 ; 0000-0002-7392-3477 ; 0000-0003-2861-8438 ; 0000-0003-0808-2285 ; 0000-0002-9353-9042 ; 0000-0002-3272-5239 ; 0000-0002-7419-6240 ; 0009-0003-0418-736X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3610885$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>315,782,786,2284,27931,27932,40203,76236</link.rule.ids></links><search><creatorcontrib>Wang, Juexing</creatorcontrib><creatorcontrib>Wang, Guangjing</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Liu, Li</creatorcontrib><creatorcontrib>Zeng, Huacheng</creatorcontrib><creatorcontrib>Xiao, Li</creatorcontrib><creatorcontrib>Cao, Zhichao</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Li, Tianxing</creatorcontrib><title>PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.</description><subject>Architectures</subject><subject>Cloud computing</subject><subject>Computer systems organization</subject><subject>Computing methodologies</subject><subject>Distributed architectures</subject><subject>Human-centered computing</subject><subject>Learning paradigms</subject><subject>Machine learning</subject><subject>Multi-task learning</subject><subject>Ubiquitous and mobile computing</subject><subject>Ubiquitous and mobile computing systems and tools</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj82KAjEQhMOioLji3RfwNNo9SXcyR5FdFQbcg56HNsmAy8rKxItv74g_eKqi6qOglBohTBENzTQjOEcfqp8ba7KC2HbefE8NU_oFACy0dmD7qvsz3y5Wn6pby1-Kw4cO1O77q82zcrNcL-ZlJjnwOUNiZLKsQy2FZTBB9kTkeR-RJBYMzrs2rvMgEYJDm8dAPhqUqD2gHqjJfdc3_yk1sa5OzeEozaVCqG4HqseBlhzfSfHHF_Qsr4uHPCM</recordid><startdate>20230927</startdate><enddate>20230927</enddate><creator>Wang, Juexing</creator><creator>Wang, Guangjing</creator><creator>Zhang, Xiao</creator><creator>Liu, Li</creator><creator>Zeng, Huacheng</creator><creator>Xiao, Li</creator><creator>Cao, Zhichao</creator><creator>Gu, Lin</creator><creator>Li, Tianxing</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-4330-7098</orcidid><orcidid>https://orcid.org/0000-0002-8159-9072</orcidid><orcidid>https://orcid.org/0000-0002-7392-3477</orcidid><orcidid>https://orcid.org/0000-0003-2861-8438</orcidid><orcidid>https://orcid.org/0000-0003-0808-2285</orcidid><orcidid>https://orcid.org/0000-0002-9353-9042</orcidid><orcidid>https://orcid.org/0000-0002-3272-5239</orcidid><orcidid>https://orcid.org/0000-0002-7419-6240</orcidid><orcidid>https://orcid.org/0009-0003-0418-736X</orcidid></search><sort><creationdate>20230927</creationdate><title>PATCH</title><author>Wang, Juexing ; Wang, Guangjing ; Zhang, Xiao ; Liu, Li ; Zeng, Huacheng ; Xiao, Li ; Cao, Zhichao ; Gu, Lin ; Li, Tianxing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a206t-156165763dfa97604dab555c6be15ae9608c8604f2dae0d8172ed5ce41ae3c013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Architectures</topic><topic>Cloud computing</topic><topic>Computer systems organization</topic><topic>Computing methodologies</topic><topic>Distributed architectures</topic><topic>Human-centered computing</topic><topic>Learning paradigms</topic><topic>Machine learning</topic><topic>Multi-task learning</topic><topic>Ubiquitous and mobile computing</topic><topic>Ubiquitous and mobile computing systems and tools</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Juexing</creatorcontrib><creatorcontrib>Wang, Guangjing</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Liu, Li</creatorcontrib><creatorcontrib>Zeng, Huacheng</creatorcontrib><creatorcontrib>Xiao, Li</creatorcontrib><creatorcontrib>Cao, Zhichao</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Li, Tianxing</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Juexing</au><au>Wang, Guangjing</au><au>Zhang, Xiao</au><au>Liu, Li</au><au>Zeng, Huacheng</au><au>Xiao, Li</au><au>Cao, Zhichao</au><au>Gu, Lin</au><au>Li, Tianxing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2023-09-27</date><risdate>2023</risdate><volume>7</volume><issue>3</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><artnum>130</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3610885</doi><tpages>24</tpages><orcidid>https://orcid.org/0009-0006-4330-7098</orcidid><orcidid>https://orcid.org/0000-0002-8159-9072</orcidid><orcidid>https://orcid.org/0000-0002-7392-3477</orcidid><orcidid>https://orcid.org/0000-0003-2861-8438</orcidid><orcidid>https://orcid.org/0000-0003-0808-2285</orcidid><orcidid>https://orcid.org/0000-0002-9353-9042</orcidid><orcidid>https://orcid.org/0000-0002-3272-5239</orcidid><orcidid>https://orcid.org/0000-0002-7419-6240</orcidid><orcidid>https://orcid.org/0009-0003-0418-736X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2474-9567
ispartof Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2023-09, Vol.7 (3), p.1-24, Article 130
issn 2474-9567
2474-9567
language eng
recordid cdi_crossref_primary_10_1145_3610885
source Access via ACM Digital Library
subjects Architectures
Cloud computing
Computer systems organization
Computing methodologies
Distributed architectures
Human-centered computing
Learning paradigms
Machine learning
Multi-task learning
Ubiquitous and mobile computing
Ubiquitous and mobile computing systems and tools
title PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T20%3A09%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PATCH:%20A%20Plug-in%20Framework%20of%20Non-blocking%20Inference%20for%20Distributed%20Multimodal%20System&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Wang,%20Juexing&rft.date=2023-09-27&rft.volume=7&rft.issue=3&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.artnum=130&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3610885&rft_dat=%3Cacm_cross%3E3610885%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true