High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models

The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.85240-85254
Hauptverfasser: Kim, Jeong-Jae, On, Byung-Won, Lee, Ingyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average F_{1} -score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in F_{1} -score.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3086586