gcimpute : A Package for Missing Data Imputation

This article introduces the Python package gcimpute for missing data imputation. Package gcimpute can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model. This semiparamet...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of statistical software 2024-02, Vol.108 (4)
Hauptverfasser: Zhao, Yuxuan, Udell, Madeleine
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article introduces the Python package gcimpute for missing data imputation. Package gcimpute can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. The package also provides specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation). This article describes the underlying methodology and demonstrates how to use the software package.
ISSN:1548-7660
1548-7660
DOI:10.18637/jss.v108.i04