System and methods for machine learning training data selection

A system and method are disclosed for running a plurality of simulation tests on a first machine learning model to obtain a plurality of results that are each produced during a respective simulation test, the first machine learning model gradually trained using first training data historically colle...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Khirwadkar, Tanmay, Mangla, Sanjay, Bansod, Sourabh Prakash, Bhole, Chetan Pitambar, Sivaramapuram Chandrasekaran, Deepak Ramamurthi
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A system and method are disclosed for running a plurality of simulation tests on a first machine learning model to obtain a plurality of results that are each produced during a respective simulation test, the first machine learning model gradually trained using first training data historically collected over a period of time, the first training data comprising a plurality of first training data sets each including a subset of first training inputs and first target outputs associated with one of a plurality of points in time during the period of time, determining a simulation test of the plurality of simulation tests at which corresponding results of the first machine learning model satisfy a threshold condition, wherein the threshold condition is based on historical data at a first point in time of the plurality of points in time, identifying a first training data set of the plurality of first training data sets on which the first machine learning model used during the determined simulation test was trained, wherein the first training data set on which the first machine learning model used during the determined simulation test was trained is associated with one or more second points in time that precede the first point in time, and determining a subset of target outputs from the identified first training data set on which the first machine learning model used during the determined simulation test was trained, the determined subset of first target outputs to define an amount of second training data to be sufficient to train a second machine learning model.