The Automatic Page Grouping System for the Result of WEB Retrieval Using Vector Space Model Method and Fuzzy Reasoning

Since search engines are mainly used for Web page retrieval, the problems are pointed out that required Web pages are not displayed on a higher rank in retrieval result. One of the reasons is that, retrieval result is selected only by the reason that it includes searching key words in them. Even if...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 2006, Vol.18(2), pp.184-195
Hauptverfasser: JOICHI, Hiroo, MIYOSHI, Tsutomu
Format: Artikel
Sprache:eng ; jpn
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Since search engines are mainly used for Web page retrieval, the problems are pointed out that required Web pages are not displayed on a higher rank in retrieval result. One of the reasons is that, retrieval result is selected only by the reason that it includes searching key words in them. Even if the user uses same keywords, different kinds of pages tend to be mixed in retrieval result because of polysemy or ambiguity of words. In order to improve the retrieval result, these are same studies which classifies retrieval result to the group according to page contents using the vector space model method. The vector space model method is measuring the degree of similarity with other pages by the frequency of word appearance, however, this method has two problems. One is that, a cost of calculating the similarity is too high because all words appearing to even once in one page of all pages are used. The reduction of calculation cost should be considered because quick response is better for Web retrieval. In our system, we tried to reduce computational cost by selecting words using fuzzy reasoning. Another is that, it is difficult to show a group name or title to the user. Since this method only calculates the similarity of page, it cannot choose words representing a group. From the viewpoint of the user's convenience, it is desirable to add the technique of creating a group name or title automatically. In our method, we tried to create group names automatically by using the frequency of word co-occurrence. In this paper, we proposed the system which classifies retrieval result to the group according to page contents by some methods, that is, the fuzzy reasoning for selecting index words, the frequency of word co-occurrence for creating group indexes, and the vector space model method for classifying pages. From the experiments, we confirmed two points. One is that, 200 is better number of selected words from the viewpoint of calculation cost and classification accuracy. Another is that, proposed system performs similar classification for retrieval pages in terms of human sense.
ISSN:1347-7986
1881-7203
1881-7203
DOI:10.3156/jsoft.18.184