Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2018, Vol.32 (1), p.83-123
Hauptverfasser:	Yeh, Chin-Chia Michael, Zhu, Yan, Ulanova, Liudmila, Begum, Nurjahan, Ding, Yifei, Dau, Hoang Anh, Zimmerman, Zachary, Silva, Diego Furtado, Mueen, Abdullah, Keogh, Eamonn
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Bioinformatics Chemistry and Earth Sciences Computer Science Data mining Data Mining and Knowledge Discovery Datasets Deoxyribonucleic acid DNA Information Storage and Retrieval Monitoring Physics Pruning Seismology Similarity Statistics for Engineering Time series
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	123
container_issue	1
container_start_page	83
container_title	Data mining and knowledge discovery
container_volume	32
creator	Yeh, Chin-Chia Michael Zhu, Yan Ulanova, Liudmila Begum, Nurjahan Ding, Yifei Dau, Hoang Anh Zimmerman, Zachary Silva, Diego Furtado Mueen, Abdullah Keogh, Eamonn
description	The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.
doi_str_mv	10.1007/s10618-017-0519-9
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1992787451</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1992787451</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1992787451</pqid></control><display><type>article</type><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creator><creatorcontrib>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</creatorcontrib><description>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><identifier>ISSN: 1384-5810</identifier><identifier>EISSN: 1573-756X</identifier><identifier>DOI: 10.1007/s10618-017-0519-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Chemistry and Earth Sciences ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Deoxyribonucleic acid ; DNA ; Information Storage and Retrieval ; Monitoring ; Physics ; Pruning ; Seismology ; Similarity ; Statistics for Engineering ; Time series</subject><ispartof>Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123</ispartof><rights>The Author(s) 2017</rights><rights>Data Mining and Knowledge Discovery is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</citedby><cites>FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10618-017-0519-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10618-017-0519-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><title>Data mining and knowledge discovery</title><addtitle>Data Min Knowl Disc</addtitle><description>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Chemistry and Earth Sciences</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Information Storage and Retrieval</subject><subject>Monitoring</subject><subject>Physics</subject><subject>Pruning</subject><subject>Seismology</subject><subject>Similarity</subject><subject>Statistics for Engineering</subject><subject>Time series</subject><issn>1384-5810</issn><issn>1573-756X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kMtKAzEUhoMoWKsP4C7g1tGcyWQy406KNxDcVHAX0slJmzI3k1TbtzelLty4-s-B_wIfIZfAboAxeRuAlVBlDGTGBNRZfUQmICTPpCg_jtPNqyITFbBTchbCmjEmcs4mpJm7DmlA7zDQ9eD6cE27ITqb1LjQDN4EqntDw0qP2GIMd1TTTe_szvVL-uXwm8aVjhS3Yzu4GNKHtNPRuy0d_WBdi-fkxOo24MWvTsn748N89py9vj29zO5fs4ZDGTOzyAsDWoraSjTIbS24LrASJYcCSml4wytRWblgWKA0iAKMMSXDMs-5bfiUXB160-7nBkNU62Hj-zSpoK5zWclCQHLBwdX4IQSPVo3eddrvFDC1Z6kOLFViqfYsVZ0y-SETkrdfov_T_G_oB33hd_4</recordid><startdate>2018</startdate><enddate>2018</enddate><creator>Yeh, Chin-Chia Michael</creator><creator>Zhu, Yan</creator><creator>Ulanova, Liudmila</creator><creator>Begum, Nurjahan</creator><creator>Ding, Yifei</creator><creator>Dau, Hoang Anh</creator><creator>Zimmerman, Zachary</creator><creator>Silva, Diego Furtado</creator><creator>Mueen, Abdullah</creator><creator>Keogh, Eamonn</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>2018</creationdate><title>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</title><author>Yeh, Chin-Chia Michael ; Zhu, Yan ; Ulanova, Liudmila ; Begum, Nurjahan ; Ding, Yifei ; Dau, Hoang Anh ; Zimmerman, Zachary ; Silva, Diego Furtado ; Mueen, Abdullah ; Keogh, Eamonn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-db24d1a759f7ede3f953a4e856314167d3c3858f7b0e4e7dee51ddd60e6223fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Chemistry and Earth Sciences</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Information Storage and Retrieval</topic><topic>Monitoring</topic><topic>Physics</topic><topic>Pruning</topic><topic>Seismology</topic><topic>Similarity</topic><topic>Statistics for Engineering</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yeh, Chin-Chia Michael</creatorcontrib><creatorcontrib>Zhu, Yan</creatorcontrib><creatorcontrib>Ulanova, Liudmila</creatorcontrib><creatorcontrib>Begum, Nurjahan</creatorcontrib><creatorcontrib>Ding, Yifei</creatorcontrib><creatorcontrib>Dau, Hoang Anh</creatorcontrib><creatorcontrib>Zimmerman, Zachary</creatorcontrib><creatorcontrib>Silva, Diego Furtado</creatorcontrib><creatorcontrib>Mueen, Abdullah</creatorcontrib><creatorcontrib>Keogh, Eamonn</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Data mining and knowledge discovery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yeh, Chin-Chia Michael</au><au>Zhu, Yan</au><au>Ulanova, Liudmila</au><au>Begum, Nurjahan</au><au>Ding, Yifei</au><au>Dau, Hoang Anh</au><au>Zimmerman, Zachary</au><au>Silva, Diego Furtado</au><au>Mueen, Abdullah</au><au>Keogh, Eamonn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile</atitle><jtitle>Data mining and knowledge discovery</jtitle><stitle>Data Min Knowl Disc</stitle><date>2018</date><risdate>2018</risdate><volume>32</volume><issue>1</issue><spage>83</spage><epage>123</epage><pages>83-123</pages><issn>1384-5810</issn><eissn>1573-756X</eissn><abstract>The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins ) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences . The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10618-017-0519-9</doi><tpages>41</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1384-5810
ispartof	Data mining and knowledge discovery, 2018, Vol.32 (1), p.83-123
issn	1384-5810 1573-756X
language	eng
recordid	cdi_proquest_journals_1992787451
source	SpringerLink Journals - AutoHoldings
subjects	Algorithms Artificial Intelligence Bioinformatics Chemistry and Earth Sciences Computer Science Data mining Data Mining and Knowledge Discovery Datasets Deoxyribonucleic acid DNA Information Storage and Retrieval Monitoring Physics Pruning Seismology Similarity Statistics for Engineering Time series
title	Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A52%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Time%20series%20joins,%20motifs,%20discords%20and%20shapelets:%20a%20unifying%20view%20that%20exploits%20the%20matrix%20profile&rft.jtitle=Data%20mining%20and%20knowledge%20discovery&rft.au=Yeh,%20Chin-Chia%20Michael&rft.date=2018&rft.volume=32&rft.issue=1&rft.spage=83&rft.epage=123&rft.pages=83-123&rft.issn=1384-5810&rft.eissn=1573-756X&rft_id=info:doi/10.1007/s10618-017-0519-9&rft_dat=%3Cproquest_cross%3E1992787451%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1992787451&rft_id=info:pmid/&rfr_iscdi=true