Recommending Machine Learning Pipelines Based on Cumulative Metadata

The problem of automated machine learning pipeline design for a given supervised learning task is usually solved by various optimization methods. However, this entails high time complexity. There is a solution called meta-learning, which consists in training a certain model with metadata of the resu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the XXth Conference of Open Innovations Association FRUCT 2023-05, Vol.33 (2), p.331-334
Hauptverfasser:	Maxim Aliev, Sergey B Muravyov
Format:	Artikel
Sprache:	eng
Schlagworte:	automl data model design meta-learning ontology combined algorithm selection and hyperparameter optimization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The problem of automated machine learning pipeline design for a given supervised learning task is usually solved by various optimization methods. However, this entails high time complexity. There is a solution called meta-learning, which consists in training a certain model with metadata of the results of solving similar problems. Nevertheless, this approach also has a limitation: the need for a large amount of knowledge to achieve high efficiency of the model. Based on the literature analyzed by the authors, this problem still remains relevant. In particular, auto- sklearn, one of the state-of-the-art solutions, uses a set of metadata that is predetermined and does not change based on new run results. The ontological data model proposed by the authors, together with the mechanism of automated knowledge enrichment, are designed to reduce the impact of the above restriction. Currently, the pipeline recommendation process includes two scenarios: the scenario of having a hash representation of the original data set in storage; the reverse scenario, in which the pipeline is recommended based on Bayesian optimization over the global space of machine learning algorithms and their associate hyper-parameters. As part of the experiment, the pipeline inference time was measured for both scenarios. The results confirmed the superiority of the metadata- driven recommendation and the increase in this advantage as the dimension of the input data increased.
ISSN:	2305-7254 2343-0737
DOI:	10.5281/zenodo.8004565