A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Infor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2016-08, Vol.11 (8), p.e0157988-e0157988
Hauptverfasser:	Naeni, Leila M, Craig, Hugh, Berretta, Regina, Moscato, Pablo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Authoring Authorship Bioinformatics Biology and Life Sciences Classification Cluster Analysis Clustering Clustering (Computers) Computer and Information Sciences Computer engineering Computer science Data analysis Data mining Datasets Earth Sciences Electrical engineering Engineering and Technology Graphs Identification methods Information theory Methodology Methods Modularity Physical Sciences Research and Analysis Methods Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0157988