IMPUTING MISSING VALUES IN A DATASET IN THE PRESENCE OF DATA QUALITY DISPARITY

A computer-implemented method, system and computer program product for imputing missing data in the presence of data quality disparity. An optimization problem of imputing the missing values in the dataset with a presence of data quality disparity is formulated as a black-box optimization problem wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hans, Sandeep, Saha, Diptikalyan, Arya, Vijay
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer-implemented method, system and computer program product for imputing missing data in the presence of data quality disparity. An optimization problem of imputing the missing values in the dataset with a presence of data quality disparity is formulated as a black-box optimization problem with an objective of jointly maximining both the fairness metric and an accuracy of the model (machine learning model) trained to identify the missing values to be imputed in the dataset for the sensitive group. Missing values to be imputed in the dataset may then be identified based on maximizing the fairness metric and the accuracy of the model. In this manner, the disparity of the data quality in machine learning datasets involving missing data among sensitive groups is effectively handled.