Study on Short Text Classification with Imperfect Labels
Short text classification techniques have been widely studied.When these techniques are applied to domain short text forproduction, as textual data accumulates, people often encounter problems mainly in two aspects: the imperfect labels and mistakenly-labeled training dataset.First, the class label...
Gespeichert in:
Veröffentlicht in: | Ji suan ji ke xue 2023-01, Vol.50 (1), p.185-193 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | chi |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Short text classification techniques have been widely studied.When these techniques are applied to domain short text forproduction, as textual data accumulates, people often encounter problems mainly in two aspects: the imperfect labels and mistakenly-labeled training dataset.First, the class label set is generally dynamic in nature.Second, when domain annotators label textual data, it is hard to distinguish some fine-grained class label from others.For the above problems, this paper analyzes the shortcomings of an actual and complex telecom domain label set with numerous classes in depth and proposes a conceptual model for the imperfect multi-classification label system.Based on the conceptual model, for repairing the conflicts and omissions in a labeled dataset, we introduce a semi-automatic method for detecting these problems iteratively with the help of a seed dataset.After repairing the conflicts and omissions caused by a dynamic label set and mistakes of annotators, after about six months of iteration, |
---|---|
ISSN: | 1002-137X |
DOI: | 10.11896/jsjkx.211100278 |