Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych

One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Studia Socjologiczne 2022, Vol.245 (2), p.137-164
1. Verfasser: Matuszewski, Paweł
Format: Artikel
Sprache:pol
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.
ISSN:0039-3371
2545-2770
2545-2770
DOI:10.24425/sts.2022.141426