KDX: an indexer for support vector machines

Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2006-06, Vol.18 (6), p.748-763
Hauptverfasser:	Navneet Panda, Chang, E.Y.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm design and analysis Algorithms Applied sciences Artificial intelligence Computer science control theory systems Connectionism. Neural networks Cost analysis Costs Data mining Exact sciences and technology Face recognition Indexing Information retrieval Information systems. Data bases Kernel Kernels Machine learning Memory organisation. Data processing Performance analysis Query processing Searching Software Studies Support vector machine Support vector machines {\rm{top}}{\hbox{-}}k retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	763
container_issue	6
container_start_page	748
container_title	IEEE transactions on knowledge and data engineering
container_volume	18
creator	Navneet Panda Chang, E.Y.
description	Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., gamma and sigma) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure
doi_str_mv	10.1109/TKDE.2006.101
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_1626230</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1626230</ieee_id><sourcerecordid>29294860</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-d7a56f041c0e27d0106f4955332d2e30559a811c063335d243fe8230dd21e92a3</originalsourceid><addsrcrecordid>eNqF0c9LwzAUB_AgCs7p0ZOXIiiCdL6XX228yTZ_sIGXCd5KSFPs6NqatKL_vSkTBh70lB_vkxeSLyGnCBNEUDerxWw-oQBygoB7ZIRCpDFFhfthDhxjznhySI68XwNAmqQ4IteL2ettpOuorHP7aV1UNC7yfds2ros-rOnCcqPNW1lbf0wOCl15e_IzjsnL_Xw1fYyXzw9P07tlbDioLs4TLWQR7jNgaZIDgiy4EoIxmlPLQAilUwxVyRgTOeWssCllkOcUraKajcnltm_rmvfe-i7blN7YqtK1bXqfUUUVTyX8D1NAjgkL8OpPiDJBDqAwDfT8F103vavDezOFlCJlFAOKt8i4xntni6x15Ua7rwwhG7LIhiyyIYuwM_iLn6baG10VTtem9LtDSaIkSB7c2daV1tpdWVIZ_od9A1LhjP0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912212321</pqid></control><display><type>article</type><title>KDX: an indexer for support vector machines</title><source>IEEE Electronic Library (IEL)</source><creator>Navneet Panda ; Chang, E.Y.</creator><creatorcontrib>Navneet Panda ; Chang, E.Y.</creatorcontrib><description>Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., gamma and sigma) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2006.101</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Connectionism. Neural networks ; Cost analysis ; Costs ; Data mining ; Exact sciences and technology ; Face recognition ; Indexing ; Information retrieval ; Information systems. Data bases ; Kernel ; Kernels ; Machine learning ; Memory organisation. Data processing ; Performance analysis ; Query processing ; Searching ; Software ; Studies ; Support vector machine ; Support vector machines ; {\rm{top}}{\hbox{-}}k retrieval</subject><ispartof>IEEE transactions on knowledge and data engineering, 2006-06, Vol.18 (6), p.748-763</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-d7a56f041c0e27d0106f4955332d2e30559a811c063335d243fe8230dd21e92a3</citedby><cites>FETCH-LOGICAL-c409t-d7a56f041c0e27d0106f4955332d2e30559a811c063335d243fe8230dd21e92a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1626230$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1626230$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17796064$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Navneet Panda</creatorcontrib><creatorcontrib>Chang, E.Y.</creatorcontrib><title>KDX: an indexer for support vector machines</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., gamma and sigma) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Connectionism. Neural networks</subject><subject>Cost analysis</subject><subject>Costs</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Face recognition</subject><subject>Indexing</subject><subject>Information retrieval</subject><subject>Information systems. Data bases</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Machine learning</subject><subject>Memory organisation. Data processing</subject><subject>Performance analysis</subject><subject>Query processing</subject><subject>Searching</subject><subject>Software</subject><subject>Studies</subject><subject>Support vector machine</subject><subject>Support vector machines</subject><subject>{\rm{top}}{\hbox{-}}k retrieval</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0c9LwzAUB_AgCs7p0ZOXIiiCdL6XX228yTZ_sIGXCd5KSFPs6NqatKL_vSkTBh70lB_vkxeSLyGnCBNEUDerxWw-oQBygoB7ZIRCpDFFhfthDhxjznhySI68XwNAmqQ4IteL2ettpOuorHP7aV1UNC7yfds2ros-rOnCcqPNW1lbf0wOCl15e_IzjsnL_Xw1fYyXzw9P07tlbDioLs4TLWQR7jNgaZIDgiy4EoIxmlPLQAilUwxVyRgTOeWssCllkOcUraKajcnltm_rmvfe-i7blN7YqtK1bXqfUUUVTyX8D1NAjgkL8OpPiDJBDqAwDfT8F103vavDezOFlCJlFAOKt8i4xntni6x15Ua7rwwhG7LIhiyyIYuwM_iLn6baG10VTtem9LtDSaIkSB7c2daV1tpdWVIZ_od9A1LhjP0</recordid><startdate>20060601</startdate><enddate>20060601</enddate><creator>Navneet Panda</creator><creator>Chang, E.Y.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope><scope>7TB</scope></search><sort><creationdate>20060601</creationdate><title>KDX: an indexer for support vector machines</title><author>Navneet Panda ; Chang, E.Y.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-d7a56f041c0e27d0106f4955332d2e30559a811c063335d243fe8230dd21e92a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Connectionism. Neural networks</topic><topic>Cost analysis</topic><topic>Costs</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Face recognition</topic><topic>Indexing</topic><topic>Information retrieval</topic><topic>Information systems. Data bases</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Machine learning</topic><topic>Memory organisation. Data processing</topic><topic>Performance analysis</topic><topic>Query processing</topic><topic>Searching</topic><topic>Software</topic><topic>Studies</topic><topic>Support vector machine</topic><topic>Support vector machines</topic><topic>{\rm{top}}{\hbox{-}}k retrieval</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Navneet Panda</creatorcontrib><creatorcontrib>Chang, E.Y.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Navneet Panda</au><au>Chang, E.Y.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>KDX: an indexer for support vector machines</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2006-06-01</date><risdate>2006</risdate><volume>18</volume><issue>6</issue><spage>748</spage><epage>763</epage><pages>748-763</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Support vector machines (SVMs) have been adopted by many data mining and information-retrieval applications for learning a mining or query concept, and then retrieving the "top-k" best matches to the concept. However, when the data set is large, naively scanning the entire data set to find the top matches is not scalable. In this work, we propose a kernel indexing strategy to substantially prune the search space and, thus, improve the performance of top-k queries. Our kernel indexer (KDX) takes advantage of the underlying geometric properties and quickly converges on an approximate set of top-k instances of interest. More importantly, once the kernel (e.g., Gaussian kernel) has been selected and the indexer has been constructed, the indexer can work with different kernel-parameter settings (e.g., gamma and sigma) without performance compromise. Through theoretical analysis and empirical studies on a wide variety of data sets, we demonstrate KDX to be very effective. An earlier version of this paper appeared in the 2005 SIAM International Conference on Data Mining. This version differs from the previous submission in providing a detailed cost analysis under different scenarios, specifically designed to meet the varying needs of accuracy, speed, and space requirements, developing an approach for insertion and deletion of instances, presenting the specific computations as well as the geometric properties used in performing the same, and providing detailed algorithms for each of the operations necessary to create and use the index structure</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TKDE.2006.101</doi><tpages>16</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1041-4347
ispartof	IEEE transactions on knowledge and data engineering, 2006-06, Vol.18 (6), p.748-763
issn	1041-4347 1558-2191
language	eng
recordid	cdi_ieee_primary_1626230
source	IEEE Electronic Library (IEL)
subjects	Algorithm design and analysis Algorithms Applied sciences Artificial intelligence Computer science control theory systems Connectionism. Neural networks Cost analysis Costs Data mining Exact sciences and technology Face recognition Indexing Information retrieval Information systems. Data bases Kernel Kernels Machine learning Memory organisation. Data processing Performance analysis Query processing Searching Software Studies Support vector machine Support vector machines {\rm{top}}{\hbox{-}}k retrieval
title	KDX: an indexer for support vector machines
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A57%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=KDX:%20an%20indexer%20for%20support%20vector%20machines&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Navneet%20Panda&rft.date=2006-06-01&rft.volume=18&rft.issue=6&rft.spage=748&rft.epage=763&rft.pages=748-763&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2006.101&rft_dat=%3Cproquest_RIE%3E29294860%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912212321&rft_id=info:pmid/&rft_ieee_id=1626230&rfr_iscdi=true