Dataset retrieval system based on automation of data preparation with dataset description model
Summary Data preparation is the most effortful task in the process of statistical learning. Many studies related to data mining are performed without data preparation by assuming that qualified datasets are already prepared. It may hide useful patterns of data, which can result in poor performance a...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2021-01, Vol.33 (2), p.n/a |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary
Data preparation is the most effortful task in the process of statistical learning. Many studies related to data mining are performed without data preparation by assuming that qualified datasets are already prepared. It may hide useful patterns of data, which can result in poor performance and incorrect learning. Automation of data preparation can solve these problems. For automation of data preparation, a few issues should be considered, such as flexible expression of requirements according to the purpose of the learning model, accessibility to data sources, and performance degradation due to automation. In this paper, we propose a dataset description model that can express the requirements for data processing and dataset retrieval system based on automated data preparation. The proposed system makes it possible to provide good quality datasets for statistical learning applications using data preparation methods such as data acquisition, refinement, and organization. In the experiment, we demonstrate that the proposed system doesn't have performance loss as compared to the existing manual systems. Moreover, the quality of the datasets are also improved by using the proposed system. |
---|---|
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.5288 |