IMPUTING MISSING VALUES IN A DATASET IN THE PRESENCE OF DATA QUALITY DISPARITY
A computer-implemented method, system and computer program product for imputing missing data in the presence of data quality disparity. An optimization problem of imputing the missing values in the dataset with a presence of data quality disparity is formulated as a black-box optimization problem wi...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A computer-implemented method, system and computer program product for imputing missing data in the presence of data quality disparity. An optimization problem of imputing the missing values in the dataset with a presence of data quality disparity is formulated as a black-box optimization problem with an objective of jointly maximining both the fairness metric and an accuracy of the model (machine learning model) trained to identify the missing values to be imputed in the dataset for the sensitive group. Missing values to be imputed in the dataset may then be identified based on maximizing the fairness metric and the accuracy of the model. In this manner, the disparity of the data quality in machine learning datasets involving missing data among sensitive groups is effectively handled. |
---|