Research of PU Text Semi-supervised Classification Based on Ontology Feature Extraction

For the shortcomings in the method of traditional statistics-based feature extraction on PU issues, we put forward feature extraction based on ontology to improve the performance of PU classification. We improved PEBL algorithm, and get the document vector of positive set using ontology-based featur...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Na Luo, Fuyu Yuan, WanLi Zuo, Fengling He
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Application software Chemical technology Computer science Educational institutions F Score Feature extraction Frequency Laboratories Machine learning Ontologies Ontology Semi-supervised Text categorization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	For the shortcomings in the method of traditional statistics-based feature extraction on PU issues, we put forward feature extraction based on ontology to improve the performance of PU classification. We improved PEBL algorithm, and get the document vector of positive set using ontology-based feature extraction, then find the strong positive features, which include the crossing semantics in the positive documents and have higher frequency in positive set. The improved algorithm scans the documents twice. First, we get the semantic of the documents by ontology. Second, we filtrate the terms which include none of these semantic to reduce the dimension and obtain the document vector. Experiments had shown that the improved PEBL classifier increases the F1 score by 0.7389%.
DOI:	10.1109/ICMLA.2008.19