Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Data mining and knowledge discovery 2018, Vol.32 (1), p.83-123
Hauptverfasser: Yeh, Chin-Chia Michael, Zhu, Yan, Ulanova, Liudmila, Begum, Nurjahan, Ding, Yifei, Dau, Hoang Anh, Zimmerman, Zachary, Silva, Diego Furtado, Mueen, Abdullah, Keogh, Eamonn
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 123
container_issue 1
container_start_page 83
container_title Data mining and knowledge discovery
container_volume 32
creator Yeh, Chin-Chia Michael
Zhu, Yan
Ulanova, Liudmila
Begum, Nurjahan
Ding, Yifei
Dau, Hoang Anh
Zimmerman, Zachary
Silva, Diego Furtado
Mueen, Abdullah
Keogh, Eamonn
description The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.
doi_str_mv 10.1007/s10618-017-0519-9
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1992787451</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1992787451</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1992787451</pqid></control><display><type>article</type><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creator><creatorcontrib>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creatorcontrib><description>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><identifier>ISSN: 1384-5810</identifier><identifier>EISSN: 1573-756X</identifier><identifier>DOI: 10.1007/s10618-017-0519-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Chemistry and Earth Sciences ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Deoxyribonucleic acid ; DNA ; Information Storage and Retrieval ; Monitoring ; Physics ; Pruning ; Seismology ; Similarity ; Statistics for Engineering ; Time series</subject><ispartof>Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123</ispartof><rights>The Author(s) 2017</rights><rights>Data Mining and Knowledge Discovery is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</citedby><cites>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10618-017-0519-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10618-017-0519-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><title>Data mining and knowledge discovery</title><addtitle>Data Min Knowl Disc</addtitle><description>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Chemistry and Earth Sciences</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Information Storage and Retrieval</subject><subject>Monitoring</subject><subject>Physics</subject><subject>Pruning</subject><subject>Seismology</subject><subject>Similarity</subject><subject>Statistics for Engineering</subject><subject>Time series</subject><issn>1384-5810</issn><issn>1573-756X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</recordid><startdate>2018</startdate><enddate>2018</enddate><creator>Yeh, Chin-Chia Michael</creator><creator>Zhu, Yan</creator><creator>Ulanova, Liudmila</creator><creator>Begum, Nurjahan</creator><creator>Ding, Yifei</creator><creator>Dau, Hoang Anh</creator><creator>Zimmerman, Zachary</creator><creator>Silva, Diego Furtado</creator><creator>Mueen, Abdullah</creator><creator>Keogh, Eamonn</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>2018</creationdate><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><author>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Chemistry and Earth Sciences</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Information Storage and Retrieval</topic><topic>Monitoring</topic><topic>Physics</topic><topic>Pruning</topic><topic>Seismology</topic><topic>Similarity</topic><topic>Statistics for Engineering</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Data mining and knowledge discovery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yeh, Chin-Chia Michael</au><au>Zhu, Yan</au><au>Ulanova, Liudmila</au><au>Begum, Nurjahan</au><au>Ding, Yifei</au><au>Dau, Hoang Anh</au><au>Zimmerman, Zachary</au><au>Silva, Diego Furtado</au><au>Mueen, Abdullah</au><au>Keogh, Eamonn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</atitle><jtitle>Data mining and knowledge discovery</jtitle><stitle>Data Min Knowl Disc</stitle><date>2018</date><risdate>2018</risdate><volume>32</volume><issue>1</issue><spage>83</spage><epage>123</epage><pages>83-123</pages><issn>1384-5810</issn><eissn>1573-756X</eissn><abstract>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10618-017-0519-9</doi><tpages>41</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1384-5810
ispartof Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123
issn 1384-5810
1573-756X
language eng
recordid cdi_proquest_journals_1992787451
source SpringerLink Journals - AutoHoldings
subjects Algorithms
Artificial Intelligence
Bioinformatics
Chemistry and Earth Sciences
Computer Science
Data mining
Data Mining and Knowledge Discovery
Datasets
Deoxyribonucleic acid
DNA
Information Storage and Retrieval
Monitoring
Physics
Pruning
Seismology
Similarity
Statistics for Engineering
Time series
title Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A52%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Time%20series%20joins,%20motifs,%20discords%20and%20shapelets:%20a%20unifying%20view%20that%20exploits%20the%20matrix%20profile&rft.jtitle=Data%20mining%20and%20knowledge%20discovery&rft.au=Yeh,%20Chin-Chia%20Michael&rft.date=2018&rft.volume=32&rft.issue=1&rft.spage=83&rft.epage=123&rft.pages=83-123&rft.issn=1384-5810&rft.eissn=1573-756X&rft_id=info:doi/10.1007/s10618-017-0519-9&rft_dat=%3Cproquest_cross%3E1992787451%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1992787451&rft_id=info:pmid/&rfr_iscdi=true