Study on Short Text Classification with Imperfect Labels

Short text classification techniques have been widely studied.When these techniques are applied to domain short text forproduction, as textual data accumulates, people often encounter problems mainly in two aspects: the imperfect labels and mistakenly-labeled training dataset.First, the class label...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ji suan ji ke xue 2023-01, Vol.50 (1), p.185-193
Hauptverfasser:	Liang, Haowei, Wang, Shi, Cao, Cungen
Format:	Artikel
Sprache:	chi
Schlagworte:	Classification Datasets Domains imperfect multi-classification label system\|fine-grained short text classification\|class labeling\|data cleaning Iterative methods Labels Text categorization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Short text classification techniques have been widely studied.When these techniques are applied to domain short text forproduction, as textual data accumulates, people often encounter problems mainly in two aspects: the imperfect labels and mistakenly-labeled training dataset.First, the class label set is generally dynamic in nature.Second, when domain annotators label textual data, it is hard to distinguish some fine-grained class label from others.For the above problems, this paper analyzes the shortcomings of an actual and complex telecom domain label set with numerous classes in depth and proposes a conceptual model for the imperfect multi-classification label system.Based on the conceptual model, for repairing the conflicts and omissions in a labeled dataset, we introduce a semi-automatic method for detecting these problems iteratively with the help of a seed dataset.After repairing the conflicts and omissions caused by a dynamic label set and mistakes of annotators, after about six months of iteration,
ISSN:	1002-137X
DOI:	10.11896/jsjkx.211100278