Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile
The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins...
Gespeichert in:
Veröffentlicht in: | Data mining and knowledge discovery 2018, Vol.32 (1), p.83-123 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 123 |
---|---|
container_issue | 1 |
container_start_page | 83 |
container_title | Data mining and knowledge discovery |
container_volume | 32 |
creator | Yeh, Chin-Chia Michael Zhu, Yan Ulanova, Liudmila Begum, Nurjahan Ding, Yifei Dau, Hoang Anh Zimmerman, Zachary Silva, Diego Furtado Mueen, Abdullah Keogh, Eamonn |
description | The last decade has seen a flurry of research on
all-pairs-similarity-search
(or
similarity joins
) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for
time series subsequences
. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the
time series motif
and
time series discord
problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine. |
doi_str_mv | 10.1007/s10618-017-0519-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1992787451</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1992787451</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1992787451</pqid></control><display><type>article</type><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creator><creatorcontrib>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creatorcontrib><description>The last decade has seen a flurry of research on
all-pairs-similarity-search
(or
similarity joins
) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for
time series subsequences
. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the
time series motif
and
time series discord
problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><identifier>ISSN: 1384-5810</identifier><identifier>EISSN: 1573-756X</identifier><identifier>DOI: 10.1007/s10618-017-0519-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Chemistry and Earth Sciences ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Deoxyribonucleic acid ; DNA ; Information Storage and Retrieval ; Monitoring ; Physics ; Pruning ; Seismology ; Similarity ; Statistics for Engineering ; Time series</subject><ispartof>Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123</ispartof><rights>The Author(s) 2017</rights><rights>Data Mining and Knowledge Discovery is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</citedby><cites>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10618-017-0519-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10618-017-0519-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><title>Data mining and knowledge discovery</title><addtitle>Data Min Knowl Disc</addtitle><description>The last decade has seen a flurry of research on
all-pairs-similarity-search
(or
similarity joins
) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for
time series subsequences
. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the
time series motif
and
time series discord
problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Chemistry and Earth Sciences</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Information Storage and Retrieval</subject><subject>Monitoring</subject><subject>Physics</subject><subject>Pruning</subject><subject>Seismology</subject><subject>Similarity</subject><subject>Statistics for Engineering</subject><subject>Time series</subject><issn>1384-5810</issn><issn>1573-756X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</recordid><startdate>2018</startdate><enddate>2018</enddate><creator>Yeh, Chin-Chia Michael</creator><creator>Zhu, Yan</creator><creator>Ulanova, Liudmila</creator><creator>Begum, Nurjahan</creator><creator>Ding, Yifei</creator><creator>Dau, Hoang Anh</creator><creator>Zimmerman, Zachary</creator><creator>Silva, Diego Furtado</creator><creator>Mueen, Abdullah</creator><creator>Keogh, Eamonn</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>2018</creationdate><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><author>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Chemistry and Earth Sciences</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Information Storage and Retrieval</topic><topic>Monitoring</topic><topic>Physics</topic><topic>Pruning</topic><topic>Seismology</topic><topic>Similarity</topic><topic>Statistics for Engineering</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Data mining and knowledge discovery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yeh, Chin-Chia Michael</au><au>Zhu, Yan</au><au>Ulanova, Liudmila</au><au>Begum, Nurjahan</au><au>Ding, Yifei</au><au>Dau, Hoang Anh</au><au>Zimmerman, Zachary</au><au>Silva, Diego Furtado</au><au>Mueen, Abdullah</au><au>Keogh, Eamonn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</atitle><jtitle>Data mining and knowledge discovery</jtitle><stitle>Data Min Knowl Disc</stitle><date>2018</date><risdate>2018</risdate><volume>32</volume><issue>1</issue><spage>83</spage><epage>123</epage><pages>83-123</pages><issn>1384-5810</issn><eissn>1573-756X</eissn><abstract>The last decade has seen a flurry of research on
all-pairs-similarity-search
(or
similarity joins
) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for
time series subsequences
. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the
time series motif
and
time series discord
problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10618-017-0519-9</doi><tpages>41</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1384-5810 |
ispartof | Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123 |
issn | 1384-5810 1573-756X |
language | eng |
recordid | cdi_proquest_journals_1992787451 |
source | SpringerLink Journals - AutoHoldings |
subjects | Algorithms Artificial Intelligence Bioinformatics Chemistry and Earth Sciences Computer Science Data mining Data Mining and Knowledge Discovery Datasets Deoxyribonucleic acid DNA Information Storage and Retrieval Monitoring Physics Pruning Seismology Similarity Statistics for Engineering Time series |
title | Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A52%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Time%20series%20joins,%20motifs,%20discords%20and%20shapelets:%20a%20unifying%20view%20that%20exploits%20the%20matrix%20profile&rft.jtitle=Data%20mining%20and%20knowledge%20discovery&rft.au=Yeh,%20Chin-Chia%20Michael&rft.date=2018&rft.volume=32&rft.issue=1&rft.spage=83&rft.epage=123&rft.pages=83-123&rft.issn=1384-5810&rft.eissn=1573-756X&rft_id=info:doi/10.1007/s10618-017-0519-9&rft_dat=%3Cproquest_cross%3E1992787451%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1992787451&rft_id=info:pmid/&rfr_iscdi=true |