Unsupervised learning of an IS-A taxonomy from a limited domain-specific corpus

This report addresses the problem of learning a taxonomy from a given domain-specific text corpus. We propose a novel unsupervised algorithm for this problem. Its key contributions include a clustering-based inference approach that increases recall over surface patterns and a graph-based algorithm f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:CW Reports 2014
Hauptverfasser: Alfarone, Daniele, Davis, Jesse
Format: Report
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This report addresses the problem of learning a taxonomy from a given domain-specific text corpus. We propose a novel unsupervised algorithm for this problem. Its key contributions include a clustering-based inference approach that increases recall over surface patterns and a graph-based algorithm for detecting incorrect edges that improves precision. Our system induces the taxonomy simply by analyzing the provided corpus. Thus, the learned taxonomy is focused on the concepts that are relevant for the specific corpus. An empirical evaluation on five corpora demonstrates the utility of the system.