Active Learning Based Semi-automatic Annotation of Event Corpus

In the area of Natural Language Processing, building corpus by hand was a hard and time-consuming task. Active learning promised to reduce the cost of annotating dataset for it was allowed to choose the data from which it learned. This study presented a semi-automatic annotation method based on acti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of applied sciences (Asian Network for Scientific Information) 2014-01, Vol.14 (2), p.177-182
Hauptverfasser: Fu, Jianfeng, ., Nianzu Liu, ., Shuangcheng Wang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the area of Natural Language Processing, building corpus by hand was a hard and time-consuming task. Active learning promised to reduce the cost of annotating dataset for it was allowed to choose the data from which it learned. This study presented a semi-automatic annotation method based on active learning for labeling events in Chinese text. Particularly, it focused on uncertainty-based sampling and query-by-committee based sampling algorithm to evaluate which instance was informative and could be labeled by hand in the unlabeled dataset. The selected informative instances were labeled manually for obtaining a more effective classifier. Experimental results not only demonstrated that active learning improved the accuracy of Chinese event annotation, but also showed that it reduced the number of labeling actions dramatically.
ISSN:1812-5654
1812-5662
DOI:10.3923/jas.2014.177.182