Using distant supervision to augment manually annotated data for relation extraction

Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in bi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2019-07, Vol.14 (7), p.e0216913
Hauptverfasser:	Su, Peng, Li, Gang, Wu, Cathy, Vijay-Shanker, K
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Annotations Artificial intelligence Bioinformatics Biology and Life Sciences Computer and Information Sciences Data Curation Data Mining Datasets Deep Learning Engineering and Technology Heuristic methods Humans Information science Labeling Language Linguistics Machine learning Medical literature Models, Theoretical Natural Language Processing Neural networks Noise Problem solving Proteins Research and Analysis Methods Social Sciences Transfer learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0216913