Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

The aim of this paper is to propose a supervised text classification method for the biomedical domain using semantic resources. We choose the traditional text classification method, Rocchio, for its scalability and extendibility with semantic knowledge. This paper proposes to integrate semantic aspe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Albitar, S., Fournier, S., Espinasse, B.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The aim of this paper is to propose a supervised text classification method for the biomedical domain using semantic resources. We choose the traditional text classification method, Rocchio, for its scalability and extendibility with semantic knowledge. This paper proposes to integrate semantic aspects into Rocchio through a conceptualization task. This conceptualization is realized by mapping terms that are extracted from text to their corresponding concepts in the UMLS ® Metathesaurus ® in order to take meaning into consideration during text classification. The proposed classifier is tested on the Ohsumed text corpus, which is composed of abstracts of biomedical articles retrieved from the MEDLINE ® database. The effects of Conceptualization on Rocchio's performance are discussed according to different standard similarity measures and to a variety of conceptualization strategies.
DOI:10.1109/WI-IAT.2012.210