Event Graph-Based News Clustering: The Role of Named Entity-Centered Subgraphs

In an era of exponential growth in online news sources, the need for intelligent digital solutions capable of efficiently analyzing and organizing large amounts of news content has become crucial. This paper presents a graph-based methodology designed to enhance Topic Detection and Tracking (TDT) ta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.105613-105632
Hauptverfasser: Komecoglu, Basak Buluz, Yilmaz, Burcu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In an era of exponential growth in online news sources, the need for intelligent digital solutions capable of efficiently analyzing and organizing large amounts of news content has become crucial. This paper presents a graph-based methodology designed to enhance Topic Detection and Tracking (TDT) tasks in natural language processing by efficiently clustering news events into coherent stories. The proposed approach leverages a novel event graph model that captures not only the characteristics of individual news events but also their collective narrative context. Using Named Entity Centred Frequent Subgraphs, the model excels in identifying recurring patterns of events and thus provides a framework for learning a robust, language-independent, and structured representation for structuring news stories, which represents a significant advance in the refinement of traditional clustering algorithms. Empirical experiments using a multilingual benchmark dataset, the News Clustering Dataset, highlight the superior clustering performance of our approach compared to state-of-the-art monolingual document clustering techniques, particularly in English and the competitive results in Spanish. To underline the adaptability of the methodology to low-resource languages, the Turkish 'Story-Based News Dataset' developed specifically for this study also promises to serve as an important resource for a wide range of natural language processing tasks.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3435343