PETSC: pattern-based embedding for time series classification

Efficient and interpretable classification of time series is an essential data mining task with many real-world applications. Recently several dictionary- and shapelet-based time series classification methods have been proposed that employ contiguous subsequences of fixed length. We extend pattern m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Data mining and knowledge discovery 2022-05, Vol.36 (3), p.1015-1061
Hauptverfasser:	Feremans, Len, Cule, Boris, Goethals, Bart
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Bioinformatics Chemistry and Earth Sciences Classification Computer Science Data mining Data Mining and Knowledge Discovery Datasets Embedding Information Storage and Retrieval Missing data Multivariate analysis Pattern analysis Physics Statistics for Engineering Time series
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Efficient and interpretable classification of time series is an essential data mining task with many real-world applications. Recently several dictionary- and shapelet-based time series classification methods have been proposed that employ contiguous subsequences of fixed length. We extend pattern mining to efficiently enumerate long variable-length sequential patterns with gaps. Additionally, we discover patterns at multiple resolutions thereby combining cohesive sequential patterns that vary in length, duration and resolution. For time series classification we construct an embedding based on sequential pattern occurrences and learn a linear model. The discovered patterns form the basis for interpretable insight into each class of time series. The pattern-based embedding for time series classification (PETSC) supports both univariate and multivariate time series datasets of varying length subject to noise or missing data. We experimentally validate that MR-PETSC performs significantly better than baseline interpretable methods such as DTW, BOP and SAX-VSM on univariate and multivariate time series. On univariate time series, our method performs comparably to many recent methods, including BOSS, cBOSS, S-BOSS, ProximityForest and ResNET, and is only narrowly outperformed by state-of-the-art methods such as HIVE-COTE, ROCKET, TS-CHIEF and InceptionTime. Moreover, on multivariate datasets PETSC performs comparably to the current state-of-the-art such as HIVE-COTE, ROCKET, CIF and ResNET, none of which are interpretable. PETSC scales to large datasets and the total time for training and making predictions on all 85 ‘bake off’ datasets in the UCR archive is under 3 h making it one of the fastest methods available. PETSC is particularly useful as it learns a linear model where each feature represents a sequential pattern in the time domain, which supports human oversight to ensure predictions are trustworthy and fair which is essential in financial, medical or bioinformatics applications.
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-022-00822-7