DATA GAP MITIGATION

Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yashchin, Emmanuel, Iyengar, Arun Kwangil, Patel, Dhavalkumar C, Bhamidipaty, Anuradha, Zhou, Nianjun, Shrivastava, Shrey
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed embodiments provide techniques for estimating imputation algorithm performance. Multiple imputer algorithms are selected, and an evaluation of how well each of the imputer algorithms can estimate the missing data is performed. Disclosed embodiments obtain an imputer candidate dataset (ICD). The imputer candidate dataset is compared to the incomplete data range, and a similarity metric is determined between the data range and the ICD. When the similarity metric exceeds a predetermined threshold, an imputer evaluation dataset (IED) is created from the ICD by removing one or more data points from the ICD. Each imputer algorithm is evaluated by applying the IED to it, and computing an imputer evaluation metric based on its performance. The multiple imputer algorithms are ranked based on the imputer evaluation metric. The best ranked imputer algorithm can then be selected for use on the incomplete data range within the measurement dataset.