An adaptive focused Web crawling algorithm based on learning automata
The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on w...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2012-12, Vol.37 (4), p.586-601 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 601 |
---|---|
container_issue | 4 |
container_start_page | 586 |
container_title | Applied intelligence (Dordrecht, Netherlands) |
container_volume | 37 |
creator | Akbari Torkestani, Javad |
description | The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler. |
doi_str_mv | 10.1007/s10489-012-0351-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1283710756</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2814245641</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-f5347feb0db13eb9e1622eb18ec14bf36098efbbdd7bae272f980773bb20794b3</originalsourceid><addsrcrecordid>eNp1kEtLxDAUhYMoOI7-AHcFN26q9yZp0yyHYXzAgBtFdyFpb8cOfYxJq_jv7VgXIri6i_Odw-Vj7BzhCgHUdUCQmY4BeQwiwZgfsBkmSsRKanXIZqC5jNNUvxyzkxC2ACAE4IytFm1kC7vrq3eKyi4fAhXRM7ko9_ajrtpNZOtN56v-tYmc3YddG9VkffudDX3X2N6esqPS1oHOfu6cPd2sHpd38frh9n65WMe5kLqPy0RIVZKDwqEgpwlTzslhRjlKV4oUdEalc0WhnCWueKkzUEo4x0Fp6cScXU67O9-9DRR601Qhp7q2LXVDMMgzoRBUko7oxR902w2-Hb8ziAlKlDyFkcKJyn0XgqfS7HzVWP9pEMxerJnEmlGs2Ys1fOzwqRNGtt2Q_7X8b-kLvjd6lw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1151414260</pqid></control><display><type>article</type><title>An adaptive focused Web crawling algorithm based on learning automata</title><source>SpringerNature Journals</source><creator>Akbari Torkestani, Javad</creator><creatorcontrib>Akbari Torkestani, Javad</creatorcontrib><description>The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-012-0351-2</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Automation ; Computer Science ; Design ; Intelligence ; Learning ; Machines ; Manufacturing ; Mechanical Engineering ; Processes ; Recall ; Search engines ; Searching ; Semantic web ; Semantics ; Theorems ; URLs ; Web portals</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2012-12, Vol.37 (4), p.586-601</ispartof><rights>Springer Science+Business Media, LLC 2012</rights><rights>Springer Science+Business Media New York 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-f5347feb0db13eb9e1622eb18ec14bf36098efbbdd7bae272f980773bb20794b3</citedby><cites>FETCH-LOGICAL-c349t-f5347feb0db13eb9e1622eb18ec14bf36098efbbdd7bae272f980773bb20794b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-012-0351-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-012-0351-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27929,27930,41493,42562,51324</link.rule.ids></links><search><creatorcontrib>Akbari Torkestani, Javad</creatorcontrib><title>An adaptive focused Web crawling algorithm based on learning automata</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Automation</subject><subject>Computer Science</subject><subject>Design</subject><subject>Intelligence</subject><subject>Learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Processes</subject><subject>Recall</subject><subject>Search engines</subject><subject>Searching</subject><subject>Semantic web</subject><subject>Semantics</subject><subject>Theorems</subject><subject>URLs</subject><subject>Web portals</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kEtLxDAUhYMoOI7-AHcFN26q9yZp0yyHYXzAgBtFdyFpb8cOfYxJq_jv7VgXIri6i_Odw-Vj7BzhCgHUdUCQmY4BeQwiwZgfsBkmSsRKanXIZqC5jNNUvxyzkxC2ACAE4IytFm1kC7vrq3eKyi4fAhXRM7ko9_ajrtpNZOtN56v-tYmc3YddG9VkffudDX3X2N6esqPS1oHOfu6cPd2sHpd38frh9n65WMe5kLqPy0RIVZKDwqEgpwlTzslhRjlKV4oUdEalc0WhnCWueKkzUEo4x0Fp6cScXU67O9-9DRR601Qhp7q2LXVDMMgzoRBUko7oxR902w2-Hb8ziAlKlDyFkcKJyn0XgqfS7HzVWP9pEMxerJnEmlGs2Ys1fOzwqRNGtt2Q_7X8b-kLvjd6lw</recordid><startdate>20121201</startdate><enddate>20121201</enddate><creator>Akbari Torkestani, Javad</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope></search><sort><creationdate>20121201</creationdate><title>An adaptive focused Web crawling algorithm based on learning automata</title><author>Akbari Torkestani, Javad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-f5347feb0db13eb9e1622eb18ec14bf36098efbbdd7bae272f980773bb20794b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Automation</topic><topic>Computer Science</topic><topic>Design</topic><topic>Intelligence</topic><topic>Learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Processes</topic><topic>Recall</topic><topic>Search engines</topic><topic>Searching</topic><topic>Semantic web</topic><topic>Semantics</topic><topic>Theorems</topic><topic>URLs</topic><topic>Web portals</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Akbari Torkestani, Javad</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Akbari Torkestani, Javad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An adaptive focused Web crawling algorithm based on learning automata</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2012-12-01</date><risdate>2012</risdate><volume>37</volume><issue>4</issue><spage>586</spage><epage>601</epage><pages>586-601</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10489-012-0351-2</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0924-669X |
ispartof | Applied intelligence (Dordrecht, Netherlands), 2012-12, Vol.37 (4), p.586-601 |
issn | 0924-669X 1573-7497 |
language | eng |
recordid | cdi_proquest_miscellaneous_1283710756 |
source | SpringerNature Journals |
subjects | Algorithms Artificial Intelligence Automation Computer Science Design Intelligence Learning Machines Manufacturing Mechanical Engineering Processes Recall Search engines Searching Semantic web Semantics Theorems URLs Web portals |
title | An adaptive focused Web crawling algorithm based on learning automata |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T23%3A56%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20adaptive%20focused%20Web%20crawling%20algorithm%20based%20on%20learning%20automata&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Akbari%20Torkestani,%20Javad&rft.date=2012-12-01&rft.volume=37&rft.issue=4&rft.spage=586&rft.epage=601&rft.pages=586-601&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-012-0351-2&rft_dat=%3Cproquest_cross%3E2814245641%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1151414260&rft_id=info:pmid/&rfr_iscdi=true |