Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle

This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2008-03, Vol.30 (3), p.424-437
Hauptverfasser:	Fujino, A., Ueda, N., Saito, K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Artificial Intelligence Bias bias correction Classification Classifiers Computer science control theory systems Computer Simulation Design engineering Design methodology Discriminant Analysis Entropy Exact sciences and technology generative model Hidden Markov models Hybrid power systems Information Storage and Retrieval - methods Learning Machine learning Mathematical models Maximum entropy maximum entropy principle Models, Statistical Pattern recognition Pattern Recognition, Automated - methods Predictive models Reproducibility of Results Semisupervised learning Sensitivity and Specificity Speech and sound recognition and synthesis. Linguistics Studies Supervised learning Text categorization text classification Texts unlabeled samples
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a method for designing semisupervised classifiers trained on labeled and unlabeled samples. We focus on a probabilistic semisupervised classifier design for multiclass and single-labeled classification problems and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed the generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.
ISSN:	0162-8828 1939-3539
DOI:	10.1109/TPAMI.2007.70710