A Sample Extension Method Based on Wikipedia and Its Application in Text Classification

Text classification is a topic in natural language processing that is particularly useful for Internet information processing. Methods based on supervised learning require a large amount of manually annotated training samples. The annotation of training samples is time consuming, and performance rel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2018-10, Vol.102 (4), p.3851-3867
Hauptverfasser: Zhu, Wenhao, Liu, Yiting, Hu, Guannan, Ni, Jianyue, Lu, Zhiguo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text classification is a topic in natural language processing that is particularly useful for Internet information processing. Methods based on supervised learning require a large amount of manually annotated training samples. The annotation of training samples is time consuming, and performance relies heavily on the quality of the training samples. This paper presents a text classification method based on sample extension. The extension is based on the correlation of the labeled sample data and the concepts in Wikipedia. Combined with the rich link relationships between concepts, we selected appropriate articles from Wikipedia to expand the training sample set. By introducing the large amount of rich semantic concept pages that are contained in Wikipedia along with links that are related to different pages, our approach enhances the performance and generalization of the classifier. Experiments demonstrate that the performance of the method proposed in this paper is better than that of both supervised and semi-supervised methods.
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-018-5416-z