Recommending Machine Learning Pipelines Based on Cumulative Metadata
The problem of automated machine learning pipeline design for a given supervised learning task is usually solved by various optimization methods. However, this entails high time complexity. There is a solution called meta-learning, which consists in training a certain model with metadata of the resu...
Gespeichert in:
Veröffentlicht in: | Proceedings of the XXth Conference of Open Innovations Association FRUCT 2023-05, Vol.33 (2), p.331-334 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The problem of automated machine learning pipeline design for a given supervised learning task is usually solved by various optimization methods. However, this entails high time complexity. There is a solution called meta-learning, which consists in training a certain model with metadata of the results of solving similar problems. Nevertheless, this approach also has a limitation: the need for a large amount of knowledge to achieve high efficiency of the model. Based on the literature analyzed by the authors, this problem still remains relevant. In particular, auto- sklearn, one of the state-of-the-art solutions, uses a set of metadata that is predetermined and does not change based on new run results. The ontological data model proposed by the authors, together with the mechanism of automated knowledge enrichment, are designed to reduce the impact of the above restriction. Currently, the pipeline recommendation process includes two scenarios: the scenario of having a hash representation of the original data set in storage; the reverse scenario, in which the pipeline is recommended based on Bayesian optimization over the global space of machine learning algorithms and their associate hyper-parameters. As part of the experiment, the pipeline inference time was measured for both scenarios. The results confirmed the superiority of the metadata- driven recommendation and the increase in this advantage as the dimension of the input data increased. |
---|---|
ISSN: | 2305-7254 2343-0737 |
DOI: | 10.5281/zenodo.8004565 |