Hierarchical multiples self-attention mechanism for multi-modal analysis

Because of the massive multimedia in daily life, people perceive the world by concurrently processing and fusing multi-modalities with high-dimensional data which may include text, vision, audio and some others. Depending on the popular Machine Learning, we would like to get much better fusion resul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2023-12, Vol.29 (6), p.3599-3608
Hauptverfasser: Jun, Wu, Tianliang, Zhu, Jiahui, Zhu, Tianyi, Li, Chunzhi, Wang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Because of the massive multimedia in daily life, people perceive the world by concurrently processing and fusing multi-modalities with high-dimensional data which may include text, vision, audio and some others. Depending on the popular Machine Learning, we would like to get much better fusion results. Therefore, multi-modal analysis has become an innovative field in data processing. By combining different modes, data can be more informative. However the difficulties of multi-modality analysis and processing lie in Feature extraction and Feature fusion. This paper focussed on this point to propose the BERT-HMAG model for feature extraction and LMF-SA model for multi-modality fusion. During the experiment, compared with traditional models, such as LSTM and Transformer, they are improved to a certain extent.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-023-01133-7