A Theme-based Search Technique

The current search engines usually return a large number of irrelevant documents for a certain query. As a result, accessing such information and filtering out these documents can cause frustration and often result in waste of time and effort for the users while surfing the web. This is mainly becau...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Al-Chalabi, N., Shihab, K.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The current search engines usually return a large number of irrelevant documents for a certain query. As a result, accessing such information and filtering out these documents can cause frustration and often result in waste of time and effort for the users while surfing the web. This is mainly because of the underlying techniques used in these engines. These techniques are mostly based in the frequency of the keywords of the query in the HTML code. In addition, issues such as dealing with classifying the pages found for a query according to previous visits along with features needed to make intelligent decisions regarding the access patterns of the users are not considered. This work presents an intelligent search engine, called ORCA that returns the most relevant documents for user's queries. This search engine analyses the queries and builds themes (models) to be used when the engine is confronted with similar queries. The intelligent component is used for constructing a model of the user behavior and using that model to fetch and even prefetch information and documents considered of interest to the user. It uses both latent semantic analysis and web page feature selection for clustering web pages. Latent semantic analysis is used to find the semantic relations between keywords, and between documents.
ISSN:2378-1963
2378-1971
DOI:10.1109/CEC-EEE.2007.15