INDUCTION FROM MULTI-LABEL EXAMPLES IN INFORMATION RETRIEVAL SYSTEMS: A CASE STUDY

Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied artificial intelligence 2008-05, Vol.22 (5), p.407-432
Hauptverfasser: Sarinnapakorn, Kanoksri, Kubat, Miroslav
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.
ISSN:0883-9514
1087-6545
DOI:10.1080/08839510801972827