Partitional clustering of tick data to reduce storage space
Tick data is one of the most prominent types of temporal data, as it can be used to represent data in various domains such as geophysics or finance. Storage of tick data is a challenging problem because two criteria have to be fulfilled simultaneously: the storage structure should allow fast executi...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tick data is one of the most prominent types of temporal data, as it can be used to represent data in various domains such as geophysics or finance. Storage of tick data is a challenging problem because two criteria have to be fulfilled simultaneously: the storage structure should allow fast execution of queries and the data should not occupy too much space on the hard disk or in the main memory. In this paper, we present a clustering-based solution, and we introduce a new clustering algorithm, SOPAC, that is designed to support the storage of tick data. Our approach is based on the search for a partitional clustering that optimizes storage space. We evaluate our algorithm both on publicly available real-world datasets, as well as real-world tick data from the financial domain. We also investigate on task-specific benchmarks, how well our approach estimates the optimum. Our experiments show that, for the tick data storage problem, our algorithm substantially outperforms - both in terms of statistical significance and practical relevance - state-of-the-art clustering algorithms. |
---|---|
ISSN: | 1543-9259 2767-9462 |
DOI: | 10.1109/INES.2012.6249896 |