Instance reduction for time series classification using MDL principle

Research in time series classification has shown that the one nearest neighbor with Dynamic Time Warping measure in most cases outperforms more advanced classification algorithms. Instance reduction is one of the approaches to improve time and space efficiency of nearest neighbor classifier for time...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Intelligent data analysis 2017-01, Vol.21 (3), p.491-514
Hauptverfasser:	Vinh, Vo Thanh, Anh, Duong Tuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Reduction Time compression Time series Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Research in time series classification has shown that the one nearest neighbor with Dynamic Time Warping measure in most cases outperforms more advanced classification algorithms. Instance reduction is one of the approaches to improve time and space efficiency of nearest neighbor classifier for time series data. This approach reduces the size of the training set by selecting the best representative instances and uses only them during classification of new instances. In this work, we propose a novel approach for instance reduction in time series classification. Our method consists of two steps. First, we remove the unrepresentative instances in the training set, using data editing. In the second step, we compress the training set using the Minimum Description Length principle. The main idea behind our method is that if we can compress the two time series by the Minimum Description Length principle, we will combine them into one time series. By this way, the number of instances in the training set is reduced step by step, and we stop removing instances from the training set when reaching some required percentage of instances in the training set or when we can not find any pair of instances to compress. We empirically compare our proposed method with the two previous methods, INSIGHT and Naïve Rank Reduction, over a vast majority number of time series training sets. The experimental results show that our method can outperform INSIGHT and Naïve Rank Reduction in many datasets when the percentage of selected instances in the training set is not too small, about greater than 30%.
ISSN:	1088-467X 1571-4128
DOI:	10.3233/IDA-150475