Deduplication Index for Big Code Datasets

Code duplicates in large code corpora have adverse effects on the evaluation and use of machine learning models that rely on them. Most existing corpora suffer from this problem to some extent. This dataset contains a "duplication" index for some of the existing corpora in Big Code researc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Allamanis, Miltiadis
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!