Design and implementation of a web structure mining algorithm using breadth first search strategy for academic search application

This paper deals with Web Structure Mining, using the Breadth First Search strategy. While browsing the web, the user has to go through many pages of the Internet, filter data and download required information. This task of searching and downloading is time consuming. Sometimes the search queries ca...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jeyalatha, S., Vijayakumar, B.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Adjacency List Breadth First Search Data mining Downloading Google HTML Java User interfaces Web Extraction Web pages Web Structure Mining XML
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper deals with Web Structure Mining, using the Breadth First Search strategy. While browsing the web, the user has to go through many pages of the Internet, filter data and download required information. This task of searching and downloading is time consuming. Sometimes the search queries call for specific option, say, limiting search to few links. To reduce the time spent by users, a web link extraction tool has been designed and implemented in Java, that analyzes the ways of extracting web link information using a standard interface. The Test Scenario has been presented with various keywords like Higher Education, Conference Alerts and Special Interest Group. The present work can be a useful input to Web Users, Faculty, Students and Web Administrators in a University Environment. The web extraction tool helps to save time in searching and downloading files from the web. Another strong requirement is to verify whether the search keywords which have been entered by the user, gives an user accurate and relevant results. This is made possible by performing a quick check on search links. The user can also view the internal links present in the selected HTML files and the adjacency list of the crawled files.