Datasets & utils for paper USING PRE-TRAINED MODELS TO PARTIALLY AUTOMATE CODE REVIEW ACTIVITIES

Raw and processed datasets & Configurations files for Pre-training and Fine-Tuning T5 models Pre-Training dataset Obtained by mining Stack Overflow and CodeSearchNet data. Fine-Tuning dataset We will fine-tune our T5 small model on different datasets obtained by mining code review data from Gerr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Masiero, Simone
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Raw and processed datasets & Configurations files for Pre-training and Fine-Tuning T5 models Pre-Training dataset Obtained by mining Stack Overflow and CodeSearchNet data. Fine-Tuning dataset We will fine-tune our T5 small model on different datasets obtained by mining code review data from Gerrit and GitHub repositories. Fine-Tuning dataset v1 (Small) Same dataset used by Tufano et al., abstracted code and raw comments. Fine-Tuning dataset v2 (Small) Same dataset used by Tufano et al., not abstracted code and cleaned comments. Fine-Tuning dataset (Large) Our new Large dataset
DOI:10.5281/zenodo.4812784