Precise and Scalable Querying of Syntactical Source Code Patterns Using Sample Code Snippets and a Database

While analyzing a log file of a text-based source code search engine we discovered that developers search for fine-grained syntactical patterns in 36% of queries. Currently, to cope with queries of this kind developers need to use regular expressions, to add redundant terms to the query or to combin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Panchenko, O., Karstens, J., Plattner, H., Zeier, A.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	abstract syntax trees Data models Database languages query-by-example Search engines source code query language source code search Syntactics XML XPath
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	While analyzing a log file of a text-based source code search engine we discovered that developers search for fine-grained syntactical patterns in 36% of queries. Currently, to cope with queries of this kind developers need to use regular expressions, to add redundant terms to the query or to combine searching with other tools provided by the development environment. To improve the expressiveness of the queries, these can be formulated as tree patterns of abstract syntax trees. These search patterns can be expressed by using query languages, such as XPath. However, developers usually do not work with either XPath or with AST. To shield developers from the complexity of query formulation we propose using sample code snippets as queries. The novelty of our approach is the combination of a query language that is very close to the surface programming language and a special database technology to store a large amount of abstract syntax trees. The advantage of this approach over existing source code query languages and search engines is the performance of both query formulation and query execution. This paper describes the technical details of the method and illustrates the value of this approach with performance measures and an industrial controlled experiment. All developers were able to complete the tasks of the experiment faster and more accurately by using our tool (ACS) than by using a text-based search engine. The number of false positives in the result lists was significantly decreased.
ISSN:	1092-8138
DOI:	10.1109/ICPC.2011.31