PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System

Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sens...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2023-09, Vol.7 (3), p.1-24, Article 130
Hauptverfasser: Wang, Juexing, Wang, Guangjing, Zhang, Xiao, Liu, Li, Zeng, Huacheng, Xiao, Li, Cao, Zhichao, Gu, Lin, Li, Tianxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.
ISSN:2474-9567
2474-9567
DOI:10.1145/3610885