SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python versi...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | GitHub issue resolving is a critical task in software engineering, recently
gaining significant attention in both industry and academia. Within this task,
SWE-bench has been released to evaluate issue resolving capabilities of large
language models (LLMs), but has so far only focused on Python version. However,
supporting more programming languages is also important, as there is a strong
demand in industry. As a first step toward multilingual support, we have
developed a Java version of SWE-bench, called SWE-bench-java. We have publicly
released the dataset, along with the corresponding Docker-based evaluation
environment and leaderboard, which will be continuously maintained and updated
in the coming months. To verify the reliability of SWE-bench-java, we implement
a classic method SWE-agent and test several powerful LLMs on it. As is well
known, developing a high-quality multi-lingual benchmark is time-consuming and
labor-intensive, so we welcome contributions through pull requests or
collaboration to accelerate its iteration and refinement, paving the way for
fully automated programming. |
---|---|
DOI: | 10.48550/arxiv.2408.14354 |