Variable-Length Multivariate Time Series Classification Using ROCKET: a Case Study of Incident Detection

Multivariate time series classification is a machine learning problem that can be applied to automate a wide range of real-world data analysis tasks. ROCKET proved to be an outstanding algorithm capable to classify time series accurately and quickly. The textbook variant of the multivariate time ser...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.1-1
Hauptverfasser:	Bier, Agnieszka, Jastrzebska, Agnieszka, Olszewski, Pawel
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Automation Classification Classification algorithms Data analysis Data processing Domains Feature extraction Forecasting Fraud incident detection Machine learning Multivariate analysis multivariate time series Pipelines Pipelining (computers) ROCKET Rockets Task analysis Time series Time series analysis varying-length time series
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multivariate time series classification is a machine learning problem that can be applied to automate a wide range of real-world data analysis tasks. ROCKET proved to be an outstanding algorithm capable to classify time series accurately and quickly. The textbook variant of the multivariate time series classification problem assumes that time series to be classified are all of the same length, while in real-world applications this assumption not necessarily holds. The literature of this domain does not pay enough attention to data processing pipelines for variable-length time series. Thus, in this paper, we present a thorough analysis of three preprocessing pipelines that handle variable-length time series that need to be classified with a method that requires the data to be of equal length. These three methods are truncation, padding, and forecasting of missing value. Experiments conducted on benchmark datasets, showed that the recommended procedure involves padding. Forecasting ensures similar classification accuracy, but comes at a much higher computational cost. Truncation is not a viable option. Furthermore, in the paper, we present a novel domain of application of multivariate time series classification algorithms, that is incident detection in cash transactions. This area poses substantive challenges for automated model training procedures since the data is not only variable-length, but also heavily imbalanced. In the study, we list various incident types and present trained classifiers capable to aid human auditors in their daily work.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3203523