Towards a systematic approach to manual annotation of code smells - C# Dataset of Long Method and Large Class code smells

This dataset includes open-source projects written in C# programing language, annotated for the presence of Long Method and God Class code smells. Each instance was manually annotated by at least two annotators. We explain our motivation and methodology for creating this dataset in our preprint: Lub...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Luburić, Nikola, Prokić, Simona, Grujić, Katarina-Glorija, Slivka, Jelena, Kovačević, Aleksandar, Sladić, Goran, Vidaković, Dragan
Format:	Dataset
Sprache:	eng
Schlagworte:	clean code code smell dataset manual annotation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This dataset includes open-source projects written in C# programing language, annotated for the presence of Long Method and God Class code smells. Each instance was manually annotated by at least two annotators. We explain our motivation and methodology for creating this dataset in our preprint: Luburić, N., Prokić, S., Grujić, K.G., Slivka, J., Kovačević, A., Sladić, G. and Vidaković, D., 2021. Towards a systematic approach to manual annotation of code smells. The dataset contains two excel datasheets: DataSet_Large Class.xlsx – C# classes annotated for the Large Class code smell severity. DataSet_Long Method.xlsx – C# methods annotated for the Long method code smell severity. The columns in the datasheet represent: Code Snippet ID – the full name of the code snippet. For classes, this is the package/namespace name followed by the class name. The full name of inner classes also contains the names of any outer classes (e.g., namespace.subnamespace.outerclass.innerclass). For methods, this is the full name of the class and the methods’s signature (e.g., namespace.class.method(param1Type, param2Type) ). Link – The GitHub link to the code snippet, including the commit and the start and end LOC. Code Smell – code smell for which the code snippet is examined (Large Class or Long Method). Project Link – the link to the version of the code repository that was annotated. Metrics – a list of metrics for the code snippet, calculated by our platform. Our dataset provides 25 class-level metrics for Large Class detection and 18 method-level metrics for Long Method detection The list of metrics and their definitions is available here. Final annotation – a single severity score calculated by a majority vote. Annotators – each annotator's (1, 2, or 3) assigned severity score. To help guide their reasoning for evaluating the presence and the severity of a code smell, three annotators independently annotated whether the considered heuristics apply to an evaluated code snippet. We provide these results in two separate excel datasheets: LargeClass_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. LongMethod_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. The columns of these two datasheets are: Code Snippet ID - the full name of the code snippet (matching the IDs from DataSet_Large Class.xlsx and DataSet_Long Method.xlsx) Annotators – heuristics labelled
DOI:	10.5281/zenodo.6520055