Learning representation from multiple media domains for enhanced event discovery

•Multi-domain event discovery is advocated, covering news media and social media.•A data representation learning model is proposed base on matrix factorizations.•A multi-domain and multimodal real-world event dataset has been released on GitHub. In this paper, we focus on event discovery by utilizin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2021-02, Vol.110, p.107640, Article 107640
Hauptverfasser: Yang, Zhenguo, Li, Qing, Xie, Haoran, Wang, Qi, Liu, Wenyin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Multi-domain event discovery is advocated, covering news media and social media.•A data representation learning model is proposed base on matrix factorizations.•A multi-domain and multimodal real-world event dataset has been released on GitHub. In this paper, we focus on event discovery by utilizing data distributed in multiple media domains, such as news media and social media. To this end, we propose an in-domain and cross-domain Laplacian regularization (ICLR) model, which can learn effective data representation for both textual news reports contributed by journalists in news media domain, and image posts shared by amateur users in social media domain. The achieved data representation can be used by classification and clustering strategies for existing and new event discovery, respectively. More specifically, ICLR constructs respective Laplacian regularization terms considering the property of inter-domain and intra-domain label consistency, which can be optimized by employing an alternating optimization strategy with theoretical guarantee for convergence. In particular, we collect and release a multi-domain and multimodal dataset for evaluations and public use.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107640