Structured AJAX Data Extraction Based on Agricultural Ontology

More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extrac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of Integrative Agriculture 2012-05, Vol.11 (5), p.784-791
Hauptverfasser:	LI, Chuan-xi, SU, Ya-ru, WANG, Ru-jing, WEI, Yuan-yuan, HUANG, He
Format:	Artikel
Sprache:	eng
Schlagworte:	agricultural ontology AJAX HTML标签 information extraction information extraction structured data AJAX agricultural ontology semantic annotation JavaScript semantic annotation structured data 农业领域数据提取本体结构化数据语义标注
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
ISSN:	2095-3119 2352-3425
DOI:	10.1016/S2095-3119(12)60068-9