JavaVFC: Java Vulnerability Fixing Commits from Open-source Software
We present a comprehensive dataset of Java vulnerability-fixing commits (VFCs) to advance research in Java vulnerability analysis. Our dataset, derived from thousands of open-source Java projects on GitHub, comprises two variants: JavaVFC and JavaVFC-extended. The dataset was constructed through a r...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a comprehensive dataset of Java vulnerability-fixing commits
(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived
from thousands of open-source Java projects on GitHub, comprises two variants:
JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous
process involving heuristic rules and multiple rounds of manual labeling. We
initially used keywords to filter candidate VFCs based on commit messages, then
refined this keyword set through iterative manual labeling. The final labeling
round achieved a precision score of 0.7 among three annotators. We applied the
refined keyword set to 34,321 open-source Java repositories with over 50 GitHub
stars, resulting in JavaVFC with 784 manually verified VFCs and
JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are
presented in a standardized JSONL format for easy access and analysis. This
dataset supports various research endeavors, including VFC identification,
fine-grained vulnerability detection, and automated vulnerability repair. The
JavaVFC and JavaVFC-extended are publicly available at
https://zenodo.org/records/13731781. |
---|---|
DOI: | 10.48550/arxiv.2409.05576 |