Using Seed Words to Learn to Categorize Chinese Text

In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a lar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jingbo, Zhu, Wenliang, Chen, Tianshun, Yao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a large amount of unlabeled documents. Using these learned features we develop a new Naïve Bayes classifier named NB_FLB. Experimental results show that the NB_FLB classifier performs better than other Naïve Bayes classifiers by supervised learning in small number of features cases.
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-540-30228-5_41