Shallow and deep learning for event relatedness classification

•ML techniques applied to determine event pair relatedness (event linking) based on event templates automatically extracted from online news.•Research driven by a real-world need to develop functionalities to reduce otherwise intractable event search space for intelligence gathering.•Performance of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2020-11, Vol.57 (6), p.102371, Article 102371
Hauptverfasser: Haneczok, Jacek, Piskorski, Jakub
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•ML techniques applied to determine event pair relatedness (event linking) based on event templates automatically extracted from online news.•Research driven by a real-world need to develop functionalities to reduce otherwise intractable event search space for intelligence gathering.•Performance of shallow learning methods compared to a deep learning approach based on long short-term memory (LSTM) recurrent neural network.•Focus on using linguistically lightweight features which are easily portable across languages.•Practical application of machine learning techniques falling into the subfields of NLP, information engineering, event extraction and linking. In the two recent decades various security authorities around the world acknowledged the importance of exploiting the ever-growing amount of information published on the web on various types of events for early detection of certain threats, situation monitoring and risk analysis. Since the information related to a particular real-world event might be scattered across various sources and mentioned on different dates, an important task is to link together all event mentions that are interrelated. This article studies the application of various statistical and machine learning techniques to solve a new application-oriented variation of the task of event pair relatedness classification, which merges different fine-grained event relation types reported elsewhere into one concept. The task focuses on linking event templates automatically extracted from online news by an existing event extraction system, which contain only short text snippets, and potentially erroneous and incomplete information. Results of exploring the performance of shallow learning methods such as decision tree-based random forest and gradient boosted tree ensembles (XGBoost) along with kernel-based support vector machines (SVM) are presented in comparison to both simpler shallow learners as well as a deep learning approach based on long short-term memory (LSTM) recurrent neural network. Our experiments focus on using linguistically lightweight features (some of which not reported elsewhere) which are easily portable across languages. We obtained F1 scores ranging from 92% (simplest shallow learner) to 96.4% (LSTM-based recurrent neural network) evaluated on a newly created event linking corpus.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2020.102371