Enhancing multimodal disaster tweet classification using state-of-the-art deep learning networks
During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and pi...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2022-05, Vol.81 (13), p.18483-18501 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and picture content for disaster response, previous research has primarily concentrated on the text modality and not so much success with multi-modality. Latest research in multi-modal classification in disaster related tweets uses comparatively primitive models such as KIMCNN and VGG16. In this research work we have taken this further and utilized state-of-the-art models in both text and image classification to try and improve multi-modal classification of disaster related tweets. The research was conducted on two different classification tasks, first to detect if a tweet is informative or not, second to understand the response needed. The process of multimodal analysis is broken down by incorporating different methods of feature extraction from the textual data corpus and pre-processing the corresponding image corpus, then we use several classification models to train and predict the output and compare their performances while tweaking the parameters to improve the results. Models such as XLNet, BERT and RoBERTa in text classification and ResNet, ResNeXt and DenseNet in image classification were trained and analyzed. Results show that the proposed multimodal architecture outperforms models trained using a single modality (text or image alone). Also, it proves that the newer state-of-the-art models outperform the baseline models by a reasonable margin for both the classification tasks. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-022-12217-3 |