Generalized production rules-N-gram feature extraction from abstract syntax trees (AST) for code vectorization
Herein is resource-constrained feature enrichment for analysis of parse trees such as suspicious database queries. In an embodiment, a computer receives a parse tree that contains many tree nodes. Each tree node is associated with a respective production rule that was used to generate the tree node....
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Herein is resource-constrained feature enrichment for analysis of parse trees such as suspicious database queries. In an embodiment, a computer receives a parse tree that contains many tree nodes. Each tree node is associated with a respective production rule that was used to generate the tree node. Extracted from the parse tree are many sequences of production rules having respective sequence lengths that satisfy a length constraint that accepts at least one fixed length that is greater than two. Each extracted sequence of production rules consists of respective production rules of a sequence of tree nodes in a respective directed tree path of the parse tree having a path length that satisfies that same length constraint. Based on the extracted sequences of production rules, a machine learning model generates an inference. In a bag of rules data structure, the extracted sequences of production rules are aggregated by distinct sequence and duplicates are counted. |
---|