Cross-Language News Article Clustering

This thesis describes a method of delivering topically-clustered English and Chinese news articles for monolingual readers and provides a fully-implemented application. In today’s highly-polarized political climate, we are inundated with a diversity of opinions in television and online news media ma...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Guerin, Nathan S
Format: Dissertation
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This thesis describes a method of delivering topically-clustered English and Chinese news articles for monolingual readers and provides a fully-implemented application. In today’s highly-polarized political climate, we are inundated with a diversity of opinions in television and online news media markets. Yet there are some topics, particularly those pertaining to foreign policy, in which a nation’s news media exhibits bias by nature of who’s reporting the news and to whom it’s being reported. One potential way for the media’s audience to counteract bias is by comparing and contrasting news articles about the same topic written in different languages and different countries. Such comparisons can expose unique perspectives by nature of their origin. The application developed for this thesis allows one to quickly identify articles about the same topic in different languages. It does this by clustering news articles by topic and presenting them in groups. For monolingual readers, the application integrates with Google Translate to provide a translated version of the source text. In order to provide these services, the application scrapes Chinese and English news articles from the web, extracts their relevant features, translates these features into a common human language, uses machine-learning techniques to reduce the dimensionality of the features, and stores those features for on-demand clustering and similar article retrieval. This thesis and similar projects have many possible applications, from providing the casual bilingual reader the chance to explore news coverage from different viewpoints, to use by researchers in both the US and China in better understanding the media and how it shapes public opinion. Both the application and its relevant source code are accessible on the author’s website. Natural Language Processing; Cross-Language Information Retrieval; Clustering;