Constrained Multi-Label Dataset Partitioning for Automated Machine Learning

A system includes a computing platform having processing hardware and a memory storing a software code. The processing hardware executes the software code to receive a dataset including at least some data samples having multiple metadata labels, and identify a partitioning constraint and a partition...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Farre Guiu, Miquel Angel, Martin, Marc Junyent
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A system includes a computing platform having processing hardware and a memory storing a software code. The processing hardware executes the software code to receive a dataset including at least some data samples having multiple metadata labels, and identify a partitioning constraint and a partitioning of the dataset into data subsets. The software code also executed obtains, for each metadata label, a desired distribution ratio based on the number of the data subsets and a total number of instances that each metadata label has been applied to the data samples, aggregates, using the partitioning constraint, the data samples into data sample groups, assigns, using the partitioning constraint and the desired distribution ratio for each of the metadata labels, each of the data sample groups to one of the data subsets, wherein each of the data subsets are unique, and trains, using one of the data subsets, a machine learning model.