Finding the most similar documents across multiple text databases

We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Clement Yu, King-Lup Liu, Wensheng Wu, Weiyi Meng, Rishe, N.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 162
container_issue
container_start_page 150
container_title
container_volume
creator Clement Yu
King-Lup Liu
Wensheng Wu
Weiyi Meng
Rishe, N.
description We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.
doi_str_mv 10.1109/ADL.1999.777710
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_777710</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>777710</ieee_id><sourcerecordid>777710</sourcerecordid><originalsourceid>FETCH-LOGICAL-i104t-64ad574fd23358a0b86fddc94ad7c0a960ee0ba0b1b0e8346e6bac3c7bd34ac53</originalsourceid><addsrcrecordid>eNotT8lOwzAUtFgkSukZiZN_IOE5Trwco0IBKRIXOFdeXsAoSxW7Evw9ltq5zGg08_SGkHsGJWOgH9unrmRa61JmMLggq4pLVWRZX5KNlgqk0A1UOXNFVrlRFVo3-obcxvgDUAFXakXaXZh8mL5o-kY6zjHRGMYwmIX62R1HnFKkxi1zjHQ8DikcBqQJfxP1JhlrIsY7ct2bIeLmzGvyuXv-2L4W3fvL27btipA_SoWojW9k3fuK80YZsEr03judbenAaAGIYLPPLKDitUBhjeNOWs9r4xq-Jg-nuwER94cljGb525_G838IrUyk</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Finding the most similar documents across multiple text databases</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Clement Yu ; King-Lup Liu ; Wensheng Wu ; Weiyi Meng ; Rishe, N.</creator><creatorcontrib>Clement Yu ; King-Lup Liu ; Wensheng Wu ; Weiyi Meng ; Rishe, N.</creatorcontrib><description>We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.</description><identifier>ISSN: 1092-9959</identifier><identifier>ISBN: 9780769502199</identifier><identifier>ISBN: 0769502199</identifier><identifier>EISSN: 2378-7104</identifier><identifier>DOI: 10.1109/ADL.1999.777710</identifier><language>eng</language><publisher>IEEE</publisher><subject>Australia ; Computer networks ; Database systems ; Indexing ; Information retrieval ; Information systems ; Internet ; ISDN ; Machine learning ; Transaction databases</subject><ispartof>Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries, 1999, p.150-162</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/777710$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/777710$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Clement Yu</creatorcontrib><creatorcontrib>King-Lup Liu</creatorcontrib><creatorcontrib>Wensheng Wu</creatorcontrib><creatorcontrib>Weiyi Meng</creatorcontrib><creatorcontrib>Rishe, N.</creatorcontrib><title>Finding the most similar documents across multiple text databases</title><title>Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries</title><addtitle>ADL</addtitle><description>We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.</description><subject>Australia</subject><subject>Computer networks</subject><subject>Database systems</subject><subject>Indexing</subject><subject>Information retrieval</subject><subject>Information systems</subject><subject>Internet</subject><subject>ISDN</subject><subject>Machine learning</subject><subject>Transaction databases</subject><issn>1092-9959</issn><issn>2378-7104</issn><isbn>9780769502199</isbn><isbn>0769502199</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1999</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotT8lOwzAUtFgkSukZiZN_IOE5Trwco0IBKRIXOFdeXsAoSxW7Evw9ltq5zGg08_SGkHsGJWOgH9unrmRa61JmMLggq4pLVWRZX5KNlgqk0A1UOXNFVrlRFVo3-obcxvgDUAFXakXaXZh8mL5o-kY6zjHRGMYwmIX62R1HnFKkxi1zjHQ8DikcBqQJfxP1JhlrIsY7ct2bIeLmzGvyuXv-2L4W3fvL27btipA_SoWojW9k3fuK80YZsEr03judbenAaAGIYLPPLKDitUBhjeNOWs9r4xq-Jg-nuwER94cljGb525_G838IrUyk</recordid><startdate>1999</startdate><enddate>1999</enddate><creator>Clement Yu</creator><creator>King-Lup Liu</creator><creator>Wensheng Wu</creator><creator>Weiyi Meng</creator><creator>Rishe, N.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>1999</creationdate><title>Finding the most similar documents across multiple text databases</title><author>Clement Yu ; King-Lup Liu ; Wensheng Wu ; Weiyi Meng ; Rishe, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i104t-64ad574fd23358a0b86fddc94ad7c0a960ee0ba0b1b0e8346e6bac3c7bd34ac53</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1999</creationdate><topic>Australia</topic><topic>Computer networks</topic><topic>Database systems</topic><topic>Indexing</topic><topic>Information retrieval</topic><topic>Information systems</topic><topic>Internet</topic><topic>ISDN</topic><topic>Machine learning</topic><topic>Transaction databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Clement Yu</creatorcontrib><creatorcontrib>King-Lup Liu</creatorcontrib><creatorcontrib>Wensheng Wu</creatorcontrib><creatorcontrib>Weiyi Meng</creatorcontrib><creatorcontrib>Rishe, N.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Clement Yu</au><au>King-Lup Liu</au><au>Wensheng Wu</au><au>Weiyi Meng</au><au>Rishe, N.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Finding the most similar documents across multiple text databases</atitle><btitle>Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries</btitle><stitle>ADL</stitle><date>1999</date><risdate>1999</risdate><spage>150</spage><epage>162</epage><pages>150-162</pages><issn>1092-9959</issn><eissn>2378-7104</eissn><isbn>9780769502199</isbn><isbn>0769502199</isbn><abstract>We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.</abstract><pub>IEEE</pub><doi>10.1109/ADL.1999.777710</doi><tpages>13</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1092-9959
ispartof Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries, 1999, p.150-162
issn 1092-9959
2378-7104
language eng
recordid cdi_ieee_primary_777710
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Australia
Computer networks
Database systems
Indexing
Information retrieval
Information systems
Internet
ISDN
Machine learning
Transaction databases
title Finding the most similar documents across multiple text databases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T11%3A09%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Finding%20the%20most%20similar%20documents%20across%20multiple%20text%20databases&rft.btitle=Proceedings%20IEEE%20Forum%20on%20Research%20and%20Technology%20Advances%20in%20Digital%20Libraries&rft.au=Clement%20Yu&rft.date=1999&rft.spage=150&rft.epage=162&rft.pages=150-162&rft.issn=1092-9959&rft.eissn=2378-7104&rft.isbn=9780769502199&rft.isbn_list=0769502199&rft_id=info:doi/10.1109/ADL.1999.777710&rft_dat=%3Cieee_6IE%3E777710%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=777710&rfr_iscdi=true