Using Seed Words to Learn to Categorize Chinese Text

In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a lar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jingbo, Zhu, Wenliang, Chen, Tianshun, Yao
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Exact sciences and technology Feature Case Feature Learning Information Gain Speech and sound recognition and synthesis. Linguistics Text Categorization Unlabeled Data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a large amount of unlabeled documents. Using these learned features we develop a new Naïve Bayes classifier named NB_FLB. Experimental results show that the NB_FLB classifier performs better than other Naïve Bayes classifiers by supervised learning in small number of features cases.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-30228-5_41