A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges

In recent years, deep learning algorithms have rapidly revolutionized artificial intelligence, particularly machine learning, enabling researchers and practitioners to extend previously hand-crafted feature extraction procedures. In particular, deep learning uses adaptive learning processes to learn...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information fusion 2024-05, Vol.105, p.102217, Article 102217
1. Verfasser: Bayoudh, Khaled
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, deep learning algorithms have rapidly revolutionized artificial intelligence, particularly machine learning, enabling researchers and practitioners to extend previously hand-crafted feature extraction procedures. In particular, deep learning uses adaptive learning processes to learn more complex and informative patterns from datasets of varying sizes. With the increasing availability of multimodal data streams and recent advances in deep learning algorithms, multimodal deep learning is on the rise. This requires the development of complex models that can process and analyze multimodal information in a consistent manner. However, unstructured data can come in many different forms (also known as modalities). Extracting relevant features from this data remains an ambitious goal for deep learning researchers. According to the literature, most deep learning systems consist of a single architecture (i.e., standalone deep learning). When two or more deep learning architectures are combined over multiple sensory modalities, the result is called a multimodal hybrid deep learning model. Since this research direction has received much attention in the field of deep learning, the purpose of this survey is to provide a broader overview of the topic. In this paper, we provide a comprehensive review of recent advances in multimodal hybrid deep learning, including a thorough analysis of the most commonly developed hybrid architectures. In particular, one of the main challenges in multimodal hybrid analysis is the ability of these architectures to systematically integrate cross-modal features in hybrid designs. Therefore, we propose a generic framework for multimodal hybrid learning that focuses mainly on fusion methods. We also identify trends and challenges in multimodal hybrid learning and provide insights and directions for future research. Our findings show that multimodal hybrid learning can perform well in a variety of challenging computer vision applications and tasks. •We categorize these methods based on hybrid deep architectures.•We review recent advances in unimodal and multimodal hybrid deep learning.•We consider multimodal and hybrid fusion techniques.•We highlight several challenging application-driven hybrid architectures.•We discuss current trends and challenges in multimodal hybrid learning.
ISSN:1566-2535
1872-6305
DOI:10.1016/j.inffus.2023.102217