A mutual embedded self-attention network model for code search

To improve the efficiency of program implementation, developers can selectively reuse the previously written code by searching the open-source codebase. To date, many code search methods have been proposed to actively push the limit of code search accuracy, where the methods designed using Self-Atte...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of systems and software 2023-04, Vol.198, p.111591, Article 111591
Hauptverfasser:	Hu, Haize, Liu, Jianxun, Zhang, Xiangping, Cao, Ben, Cheng, Siqiang, Long, Teng
Format:	Artikel
Sprache:	eng
Schlagworte:	Code search Code segments Machine learning MESN-CS Self-attention
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To improve the efficiency of program implementation, developers can selectively reuse the previously written code by searching the open-source codebase. To date, many code search methods have been proposed to actively push the limit of code search accuracy, where the methods designed using Self-Attention mechanism are particularly promising. However, while existing methods can improve the efficiency to capture textual semantics by attending significant words in the code component unit, they typically fail to capture the structural dependencies between the code components which may produce suboptimal search accuracy. In this paper, we propose a novel Self-Attention model termed MESN-CS which considers both word-level attention and code unit-level attention for code search. MESN-CS not only the attention weight of each word in the code component unit is calculated, but also the weight of the embedding between the code combination units is calculated. To verify the effectiveness of the proposed model, three benchmark models were compared on a large-scale code data and CodesearchNet. The experimental results show that the MESN-CS has better Recall@k, NDCG and MRR performance than baseline methods. the experiments also show that the semantic syntactic information between sequences can be effectively characterized in MESN-CS. •The defects and shortcomings of the existing code search model are analyzed.•The model of MESN-CS is studied.•A experimental analysis to verify the effectiveness of MESN-CS, DeepCS, CARLCS-CNN and SAN-CS in code search was made.
ISSN:	0164-1212 1873-1228
DOI:	10.1016/j.jss.2022.111591