Enriching Query Semantics for Code Search with Reinforcement Learning
Code search is a common practice for developers during software implementation. The challenges of accurate code search mainly lie in the knowledge gap between source code and natural language (i.e., queries). Due to the limited code-query pairs and large code-description pairs available, the prior s...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Code search is a common practice for developers during software
implementation. The challenges of accurate code search mainly lie in the
knowledge gap between source code and natural language (i.e., queries). Due to
the limited code-query pairs and large code-description pairs available, the
prior studies based on deep learning techniques focus on learning the semantic
matching relation between source code and corresponding description texts for
the task, and hypothesize that the semantic gap between descriptions and user
queries is marginal. In this work, we found that the code search models trained
on code-description pairs may not perform well on user queries, which indicates
the semantic distance between queries and code descriptions. To mitigate the
semantic distance for more effective code search, we propose QueCos, a
Query-enriched Code search model. QueCos learns to generate semantic enriched
queries to capture the key semantics of given queries with reinforcement
learning (RL). With RL, the code search performance is considered as a reward
for producing accurate semantic enriched queries. The enriched queries are
finally employed for code search. Experiments on the benchmark datasets show
that QueCos can significantly outperform the state-of-the-art code search
models. |
---|---|
DOI: | 10.48550/arxiv.2105.09630 |