The Effects of Data Imputation on Covariance and Inverse Covariance Matrix Estimation

Various data analysis techniques and procedures (correlation heatmap, linear discriminant analysis, quadratic discriminant analysis) rely on the estimation of the covariance matrix or its inverse (the precision matrix). However, missing data can pose significant challenges to this parameter estimati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.134688-134701
Hauptverfasser: Vo, Tuan L., Do, Quan Huu, Nguyen, Thu, Halvorsen, Pal, Riegler, Michael A., Nguyen, Binh T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Various data analysis techniques and procedures (correlation heatmap, linear discriminant analysis, quadratic discriminant analysis) rely on the estimation of the covariance matrix or its inverse (the precision matrix). However, missing data can pose significant challenges to this parameter estimation problem. When missing data is presented, imputation is a common way to circumvent the issue as it renders the data complete. Nevertheless, it is imperative to scrutinize the potential trade-offs when opting for imputation as opposed to task-specific methods for handling missing data, especially in the context of subsequent data analysis and inference. In this study, we undertake both empirical and theoretical investigations to assess the impact of imputation in contrast to direct parameter estimation approaches. We focus on the task of estimating the covariance matrix and precision matrix and present an analysis of the error induced by estimating the precision matrix by the inverse of an estimated covariance matrix. Additionally, we propose a sufficient condition that ensures improved performance guarantees for precision matrix estimation based on covariance matrix estimation. The experimental results show that when the number of features is small, direct parameter estimations can be recommended to estimate the precision matrix by inverting the corresponding estimated covariance matrix. However, when the number of features is not small, then inverting the covariance matrix of imputed data gives better results.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3427404