Unsupervised Episode Detection for Large-Scale News Events
Episodic structures are inherently interpretable and adaptable to evolving large-scale key events. However, state-of-the-art automatic event detection methods overlook event episodes and, therefore, struggle with these crucial characteristics. This paper introduces a novel task, episode detection, a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Episodic structures are inherently interpretable and adaptable to evolving
large-scale key events. However, state-of-the-art automatic event detection
methods overlook event episodes and, therefore, struggle with these crucial
characteristics. This paper introduces a novel task, episode detection, aimed
at identifying episodes from a news corpus containing key event articles. An
episode describes a cohesive cluster of core entities (e.g., "protesters",
"police") performing actions at a specific time and location. Furthermore, an
episode is a significant part of a larger group of episodes under a particular
key event. Automatically detecting episodes is challenging because, unlike key
events and atomic actions, we cannot rely on explicit mentions of times and
locations to distinguish between episodes or use semantic similarity to merge
inconsistent episode co-references. To address these challenges, we introduce
EpiMine, an unsupervised episode detection framework that (1) automatically
identifies the most salient, key-event-relevant terms and segments, (2)
determines candidate episodes in an article based on natural episodic
partitions estimated through shifts in discriminative term combinations, and
(3) refines and forms final episode clusters using large language model-based
reasoning on the candidate episodes. We construct three diverse, real-world
event datasets annotated at the episode level. EpiMine outperforms all
baselines on these datasets by an average 59.2% increase across all metrics. |
---|---|
DOI: | 10.48550/arxiv.2408.04873 |