Code clone detection method and system based on abstract syntax tree optimization and multi-representation

The invention discloses a code clone detection method and system based on abstract syntax tree optimization and multi-representation. The method comprises the following steps: compiling a code text to obtain a corresponding abstract syntax tree; optimizing the abstract syntax tree, including removin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHONG XINLEI, GUO ZHENJUN, LIN LIANNAN, HE HONGKUI, YU TIANCHEN, JIANG CHE
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a code clone detection method and system based on abstract syntax tree optimization and multi-representation. The method comprises the following steps: compiling a code text to obtain a corresponding abstract syntax tree; optimizing the abstract syntax tree, including removing nodes generated by a compiler and recovery nodes of compilation errors, removing declaration nodes and constant nodes, refining expression nodes, and respectively converting a selection structure and a loop structure into corresponding unified sub-tree structures; traversing the optimized abstract syntax tree to obtain a front sequence and a rear sequence; inputting the two sequences into a Transform network, and outputting a feature fingerprint corresponding to the code text; obtaining a plurality of corresponding feature fingerprints according to the plurality of code texts; and if the cosine similarity of any two feature fingerprints is greater than a first set threshold, determining that the two text codes co