Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval

► We suggest a probabilistic framework that defines query-sensitive similarity. ► The proposed similarity is based on the probability that documents are co-relevant to a given query. ► This work uses language modeling approaches to derive the co-relevance-based similarity. ► Experiment results show...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2013-03, Vol.49 (2), p.558-575
1. Verfasser:	Na, Seung-Hoon
Format:	Artikel
Sprache:	eng
Schlagworte:	Cluster hypothesis Cluster-based retrieval Clustering Clusters Documents Exact sciences and technology Information and communication sciences Information processing and retrieval Information retrieval Information retrieval systems. Information and document management system Information retrieval. Man machine relationship Information science. Documentation Information sources Inter-document similarity Mathematical models Methods Probabilistic co-relevance Probabilistic methods Probability Probability theory Query-sensitive similarity Relevance Research process. Evaluation Retrieval Sciences and techniques of general use Similarity Studies Vector space
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	575
container_issue	2
container_start_page	558
container_title	Information processing & management
container_volume	49
creator	Na, Seung-Hoon
description	► We suggest a probabilistic framework that defines query-sensitive similarity. ► The proposed similarity is based on the probability that documents are co-relevant to a given query. ► This work uses language modeling approaches to derive the co-relevance-based similarity. ► Experiment results show that the proposed co-relevance-based similarity is effective. Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).
doi_str_mv	10.1016/j.ipm.2012.10.002
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671482635</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306457312001215</els_id><sourcerecordid>1550991553</sourcerecordid><originalsourceid>FETCH-LOGICAL-c454t-1cde44b71942beeb7b3b43755906505cfb8b84f23572eaa476418337adcd2b433</originalsourceid><addsrcrecordid>eNqNkU2LFDEQhoMoOO76A7w1iOClx1Q-Ot14kmX9gAX3sHs1JOlqqKG7MyaZgfn3ZpjFgwcVQkLCU28leRh7A3wLHLoPuy3tl63gIOp-y7l4xjbQG9lqaeA523DJu1ZpI1-yVznvOOdKg9iwH_cpeudpplwoNCG2CWc8ujVgM8XU_DxgOrUZ10yFjthkWmh2icqpWdDlQ8IF19LQWkflF1cork3CkqimzNfsxeTmjK-f1iv2-Pn24eZre_f9y7ebT3dtUFqVFsKISnkDgxIe0RsvvZJG64F3musw-d73ahJSG4HOKdMp6KU0bgyjqKS8Yu8vufsU65VzsQvlgPPsVoyHbKEzoHrRSf1vVGvgHehO_g_Kh6HOZ_TtH-guHtJa32xBGD7A0Pd9peBChRRzTjjZfaLFpZMFbs8e7c5Wj_bs8XxUPdaad0_JLgc3T6m6ofy7sIaLAaSq3McLh_Wfj4TJ5kBYPY6UMBQ7RvpLl18BOLJE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1270919888</pqid></control><display><type>article</type><title>Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval</title><source>Access via ScienceDirect (Elsevier)</source><creator>Na, Seung-Hoon</creator><creatorcontrib>Na, Seung-Hoon</creatorcontrib><description>► We suggest a probabilistic framework that defines query-sensitive similarity. ► The proposed similarity is based on the probability that documents are co-relevant to a given query. ► This work uses language modeling approaches to derive the co-relevance-based similarity. ► Experiment results show that the proposed co-relevance-based similarity is effective. Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/j.ipm.2012.10.002</identifier><identifier>CODEN: IPMADK</identifier><language>eng</language><publisher>Kidlington: Elsevier Ltd</publisher><subject>Cluster hypothesis ; Cluster-based retrieval ; Clustering ; Clusters ; Documents ; Exact sciences and technology ; Information and communication sciences ; Information processing and retrieval ; Information retrieval ; Information retrieval systems. Information and document management system ; Information retrieval. Man machine relationship ; Information science. Documentation ; Information sources ; Inter-document similarity ; Mathematical models ; Methods ; Probabilistic co-relevance ; Probabilistic methods ; Probability ; Probability theory ; Query-sensitive similarity ; Relevance ; Research process. Evaluation ; Retrieval ; Sciences and techniques of general use ; Similarity ; Studies ; Vector space</subject><ispartof>Information processing & management, 2013-03, Vol.49 (2), p.558-575</ispartof><rights>2012 Elsevier Ltd</rights><rights>2015 INIST-CNRS</rights><rights>Copyright Pergamon Press Inc. Mar 2013</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c454t-1cde44b71942beeb7b3b43755906505cfb8b84f23572eaa476418337adcd2b433</citedby><cites>FETCH-LOGICAL-c454t-1cde44b71942beeb7b3b43755906505cfb8b84f23572eaa476418337adcd2b433</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ipm.2012.10.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27029134$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Na, Seung-Hoon</creatorcontrib><title>Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval</title><title>Information processing & management</title><description>► We suggest a probabilistic framework that defines query-sensitive similarity. ► The proposed similarity is based on the probability that documents are co-relevant to a given query. ► This work uses language modeling approaches to derive the co-relevance-based similarity. ► Experiment results show that the proposed co-relevance-based similarity is effective. Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).</description><subject>Cluster hypothesis</subject><subject>Cluster-based retrieval</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Documents</subject><subject>Exact sciences and technology</subject><subject>Information and communication sciences</subject><subject>Information processing and retrieval</subject><subject>Information retrieval</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information retrieval. Man machine relationship</subject><subject>Information science. Documentation</subject><subject>Information sources</subject><subject>Inter-document similarity</subject><subject>Mathematical models</subject><subject>Methods</subject><subject>Probabilistic co-relevance</subject><subject>Probabilistic methods</subject><subject>Probability</subject><subject>Probability theory</subject><subject>Query-sensitive similarity</subject><subject>Relevance</subject><subject>Research process. Evaluation</subject><subject>Retrieval</subject><subject>Sciences and techniques of general use</subject><subject>Similarity</subject><subject>Studies</subject><subject>Vector space</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkU2LFDEQhoMoOO76A7w1iOClx1Q-Ot14kmX9gAX3sHs1JOlqqKG7MyaZgfn3ZpjFgwcVQkLCU28leRh7A3wLHLoPuy3tl63gIOp-y7l4xjbQG9lqaeA523DJu1ZpI1-yVznvOOdKg9iwH_cpeudpplwoNCG2CWc8ujVgM8XU_DxgOrUZ10yFjthkWmh2icqpWdDlQ8IF19LQWkflF1cork3CkqimzNfsxeTmjK-f1iv2-Pn24eZre_f9y7ebT3dtUFqVFsKISnkDgxIe0RsvvZJG64F3musw-d73ahJSG4HOKdMp6KU0bgyjqKS8Yu8vufsU65VzsQvlgPPsVoyHbKEzoHrRSf1vVGvgHehO_g_Kh6HOZ_TtH-guHtJa32xBGD7A0Pd9peBChRRzTjjZfaLFpZMFbs8e7c5Wj_bs8XxUPdaad0_JLgc3T6m6ofy7sIaLAaSq3McLh_Wfj4TJ5kBYPY6UMBQ7RvpLl18BOLJE</recordid><startdate>20130301</startdate><enddate>20130301</enddate><creator>Na, Seung-Hoon</creator><general>Elsevier Ltd</general><general>Elsevier</general><general>Elsevier Science Ltd</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><scope>8BP</scope><scope>7SC</scope><scope>7TA</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130301</creationdate><title>Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval</title><author>Na, Seung-Hoon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c454t-1cde44b71942beeb7b3b43755906505cfb8b84f23572eaa476418337adcd2b433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Cluster hypothesis</topic><topic>Cluster-based retrieval</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Documents</topic><topic>Exact sciences and technology</topic><topic>Information and communication sciences</topic><topic>Information processing and retrieval</topic><topic>Information retrieval</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information retrieval. Man machine relationship</topic><topic>Information science. Documentation</topic><topic>Information sources</topic><topic>Inter-document similarity</topic><topic>Mathematical models</topic><topic>Methods</topic><topic>Probabilistic co-relevance</topic><topic>Probabilistic methods</topic><topic>Probability</topic><topic>Probability theory</topic><topic>Query-sensitive similarity</topic><topic>Relevance</topic><topic>Research process. Evaluation</topic><topic>Retrieval</topic><topic>Sciences and techniques of general use</topic><topic>Similarity</topic><topic>Studies</topic><topic>Vector space</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Na, Seung-Hoon</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Library & Information Sciences Abstracts (LISA) - CILIP Edition</collection><collection>Computer and Information Systems Abstracts</collection><collection>Materials Business File</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information processing & management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Na, Seung-Hoon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval</atitle><jtitle>Information processing & management</jtitle><date>2013-03-01</date><risdate>2013</risdate><volume>49</volume><issue>2</issue><spage>558</spage><epage>575</epage><pages>558-575</pages><issn>0306-4573</issn><eissn>1873-5371</eissn><coden>IPMADK</coden><abstract>► We suggest a probabilistic framework that defines query-sensitive similarity. ► The proposed similarity is based on the probability that documents are co-relevant to a given query. ► This work uses language modeling approaches to derive the co-relevance-based similarity. ► Experiment results show that the proposed co-relevance-based similarity is effective. Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees’ nearest neighbor test (NNT).</abstract><cop>Kidlington</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.ipm.2012.10.002</doi><tpages>18</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0306-4573
ispartof	Information processing & management, 2013-03, Vol.49 (2), p.558-575
issn	0306-4573 1873-5371
language	eng
recordid	cdi_proquest_miscellaneous_1671482635
source	Access via ScienceDirect (Elsevier)
subjects	Cluster hypothesis Cluster-based retrieval Clustering Clusters Documents Exact sciences and technology Information and communication sciences Information processing and retrieval Information retrieval Information retrieval systems. Information and document management system Information retrieval. Man machine relationship Information science. Documentation Information sources Inter-document similarity Mathematical models Methods Probabilistic co-relevance Probabilistic methods Probability Probability theory Query-sensitive similarity Relevance Research process. Evaluation Retrieval Sciences and techniques of general use Similarity Studies Vector space
title	Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T23%3A41%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Probabilistic%20co-relevance%20for%20query-sensitive%20similarity%20measurement%20in%20information%20retrieval&rft.jtitle=Information%20processing%20&%20management&rft.au=Na,%20Seung-Hoon&rft.date=2013-03-01&rft.volume=49&rft.issue=2&rft.spage=558&rft.epage=575&rft.pages=558-575&rft.issn=0306-4573&rft.eissn=1873-5371&rft.coden=IPMADK&rft_id=info:doi/10.1016/j.ipm.2012.10.002&rft_dat=%3Cproquest_cross%3E1550991553%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1270919888&rft_id=info:pmid/&rft_els_id=S0306457312001215&rfr_iscdi=true