Research of PU Text Semi-supervised Classification Based on Ontology Feature Extraction
For the shortcomings in the method of traditional statistics-based feature extraction on PU issues, we put forward feature extraction based on ontology to improve the performance of PU classification. We improved PEBL algorithm, and get the document vector of positive set using ontology-based featur...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | For the shortcomings in the method of traditional statistics-based feature extraction on PU issues, we put forward feature extraction based on ontology to improve the performance of PU classification. We improved PEBL algorithm, and get the document vector of positive set using ontology-based feature extraction, then find the strong positive features, which include the crossing semantics in the positive documents and have higher frequency in positive set. The improved algorithm scans the documents twice. First, we get the semantic of the documents by ontology. Second, we filtrate the terms which include none of these semantic to reduce the dimension and obtain the document vector. Experiments had shown that the improved PEBL classifier increases the F1 score by 0.7389%. |
---|---|
DOI: | 10.1109/ICMLA.2008.19 |