Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study

Text categorization systems often use machine learning techniques to induce document classifiers from preclassified examples. The fact that each example document belongs to many classes often leads to very high computational costs that sometimes grow exponentially in the number of features. Seeking...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2007-12, Vol.19 (12), p.1638-1651
Hauptverfasser:	Sarinnapakorn, K., Kubat, M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Artificial intelligence Bayesian methods Classification tree analysis Classifiers Computational efficiency Computer science control theory systems Costs data fusion Data processing. List processing. Character string processing Dempster-Shafer Theory Exact sciences and technology Information filtering Labels Machine learning Machine learning algorithms Mathematical models Memory organisation. Data processing multi-label examples Running Software Speech and sound recognition and synthesis. Linguistics Text categorization Texts Vectors Voting
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text categorization systems often use machine learning techniques to induce document classifiers from preclassified examples. The fact that each example document belongs to many classes often leads to very high computational costs that sometimes grow exponentially in the number of features. Seeking to reduce these costs, we explored the possibility of running a "baseline induction algorithm" separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2007.190663