Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications

AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different pag...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhaomeng Peng, Nengqiang He, Chunxiao Jiang, Zhihua Li, Lei Xu, Yipeng Li, Yong Ren
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different page states, which violates the rule that one URL corresponds to one unique page. The AJAX application can be modeled as a state transition graph and to crawl AJAX is to traverse the graph without prior knowledge of its structure. In this paper, we have distinguished different AJAX events which are not well defined in previous work and proposed a Graph-based AJAX State Traversal (GAST) algorithm to crawl AJAX with minimal edge visits. If topology of the graph is given, this optimization problem turns into a Directed Rural Postman Problem (DRPP) and the optimal lower bound can be obtained. Experimental results show that the proposed algorithm approaches optimum and exhibits better performance than existing work.
DOI:10.1109/ICCSEE.2012.38