Mining the Web for C\C++ code with perl
The Web represents one of the largest repositories of information ever compiled by mankind and as such search techniques are essential to navigating its depths and returning pertinent information. Typically the search techniques employed in search engines such as Google entail the use of keywords in...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Web represents one of the largest repositories of information ever compiled by mankind and as such search techniques are essential to navigating its depths and returning pertinent information. Typically the search techniques employed in search engines such as Google entail the use of keywords in which Web pages containing the specified keywords are sought out and then ranked using an algorithm such as PageRank. While keywords are suitable for many search tasks, certain types of data cannot be readily searched using keywords alone. Regular expression based pattern matching allows for enhanced search capability in that it allows for a textual pattern to be specified and matching to be performed against the pattern. Regular expressions have been developed that allow for the identification of common C\C++ code structures such a loops, conditionals and functions. These regular expressions are then integrated into a Perl program that performs a keyword based search of the Yahoo Search engine and used to extract any code elements that match those patterns. Thus an algorithm or programming technique can be specified with keywords, the Yahoo search used to identify Web pages pertinent to those keywords, and the regular expressions used to identify and extract any C\C++ code found in the resultant Web pages. Application of this technique would likely be of great benefit towards creating specialized search capabilities for software developers. |
---|---|
DOI: | 10.1109/LISAT.2010.5478283 |