ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive result...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-12
Hauptverfasser: Echihabi, Karima, Tsandilas, Theophanis, Gogolou, Anna, Bezerianos, Anastasia, Palpanas, Themis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Echihabi, Karima
Tsandilas, Theophanis
Gogolou, Anna
Bezerianos, Anastasia
Palpanas, Themis
description Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022).
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2759128016</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2759128016</sourcerecordid><originalsourceid>FETCH-proquest_journals_27591280163</originalsourceid><addsrcrecordid>eNqNi80KgkAURocgKMp3uNBaGMfMatvvSgrby83GujVpzR2L3j6FHqDVB-c7pyP6KgwDfzpWqic85quUUk1iFUVhX9x3tkrnsESHkGpLmqEhZ6uZ6aXh5icJpHQng5bcp1HQ5hfA8gQLg41TUI6OqhLe5C5tesQjGWJHOexrNG20qdFi6bTmoegWaFh7vx2I0Xp1WGz9h62etWaXXavals2VqTiaBWoqg0n4n_UFmZNJiw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2759128016</pqid></control><display><type>article</type><title>ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees</title><source>Free E- Journals</source><creator>Echihabi, Karima ; Tsandilas, Theophanis ; Gogolou, Anna ; Bezerianos, Anastasia ; Palpanas, Themis</creator><creatorcontrib>Echihabi, Karima ; Tsandilas, Theophanis ; Gogolou, Anna ; Bezerianos, Anastasia ; Palpanas, Themis</creatorcontrib><description>Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022).</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Classification ; Decision analysis ; Decision making ; Queries ; Searching ; Similarity</subject><ispartof>arXiv.org, 2022-12</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Echihabi, Karima</creatorcontrib><creatorcontrib>Tsandilas, Theophanis</creatorcontrib><creatorcontrib>Gogolou, Anna</creatorcontrib><creatorcontrib>Bezerianos, Anastasia</creatorcontrib><creatorcontrib>Palpanas, Themis</creatorcontrib><title>ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees</title><title>arXiv.org</title><description>Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022).</description><subject>Algorithms</subject><subject>Classification</subject><subject>Decision analysis</subject><subject>Decision making</subject><subject>Queries</subject><subject>Searching</subject><subject>Similarity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi80KgkAURocgKMp3uNBaGMfMatvvSgrby83GujVpzR2L3j6FHqDVB-c7pyP6KgwDfzpWqic85quUUk1iFUVhX9x3tkrnsESHkGpLmqEhZ6uZ6aXh5icJpHQng5bcp1HQ5hfA8gQLg41TUI6OqhLe5C5tesQjGWJHOexrNG20qdFi6bTmoegWaFh7vx2I0Xp1WGz9h62etWaXXavals2VqTiaBWoqg0n4n_UFmZNJiw</recordid><startdate>20221226</startdate><enddate>20221226</enddate><creator>Echihabi, Karima</creator><creator>Tsandilas, Theophanis</creator><creator>Gogolou, Anna</creator><creator>Bezerianos, Anastasia</creator><creator>Palpanas, Themis</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221226</creationdate><title>ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees</title><author>Echihabi, Karima ; Tsandilas, Theophanis ; Gogolou, Anna ; Bezerianos, Anastasia ; Palpanas, Themis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27591280163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Decision analysis</topic><topic>Decision making</topic><topic>Queries</topic><topic>Searching</topic><topic>Similarity</topic><toplevel>online_resources</toplevel><creatorcontrib>Echihabi, Karima</creatorcontrib><creatorcontrib>Tsandilas, Theophanis</creatorcontrib><creatorcontrib>Gogolou, Anna</creatorcontrib><creatorcontrib>Bezerianos, Anastasia</creatorcontrib><creatorcontrib>Palpanas, Themis</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Echihabi, Karima</au><au>Tsandilas, Theophanis</au><au>Gogolou, Anna</au><au>Bezerianos, Anastasia</au><au>Palpanas, Themis</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees</atitle><jtitle>arXiv.org</jtitle><date>2022-12-26</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022).</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_2759128016
source Free E- Journals
subjects Algorithms
Classification
Decision analysis
Decision making
Queries
Searching
Similarity
title ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T00%3A19%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=ProS:%20Data%20Series%20Progressive%20k-NN%20Similarity%20Search%20and%20Classification%20with%20Probabilistic%20Quality%20Guarantees&rft.jtitle=arXiv.org&rft.au=Echihabi,%20Karima&rft.date=2022-12-26&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2759128016%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2759128016&rft_id=info:pmid/&rfr_iscdi=true