Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

Text classification is an important field of research from mid 90s up to now. It has many applications, one of them is in Web-based biosurveillance systems which identify and summarize online disease outbreak reports. In this paper we focus on classifying Vietnamese disease outbreak reports. We inve...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2019-11
Hauptverfasser: Doan, Son, Nguyen Thi Ngoc Vinh, Tu, Minh Phuong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text classification is an important field of research from mid 90s up to now. It has many applications, one of them is in Web-based biosurveillance systems which identify and summarize online disease outbreak reports. In this paper we focus on classifying Vietnamese disease outbreak reports. We investigate important properties of disease outbreak reports, e.g., sentences containing names of outbreak disease, locations. Evaluation on 10-time 10- fold cross-validation using the Support Vector Machine algorithm shows that using sentences containing disease outbreak names with its preceding/following sentences in combination with location features achieve the best F-score with 86.67% - an improvement of 0.38% in comparison to using all raw text. Our results suggest that using important sentences and rich feature can improve performance of Vietnamese disease outbreak text classification.
ISSN:2331-8422
DOI:10.48550/arxiv.1911.09883