Impact of data quality for automatic issue classification using pre-trained language models

Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of systems and software 2024-04, Vol.210, p.111838, Article 111838
Hauptverfasser: Colavito, Giuseppe, Lanubile, Filippo, Novielli, Nicole, Quaranta, Luigi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Issue classification aims to recognize whether an issue reports a bug, a request for enhancement or support. In this paper we use pre-trained models for the automatic classification of issues and investigate how the quality of data affects the performance of classifiers. Despite the application of data quality filters, none of our attempts had a significant effect on model quality. As root cause we identify a threat to construct validity underlying the issue labeling. Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
ISSN:0164-1212
1873-1228
DOI:10.1016/j.jss.2023.111838