A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data

This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on signal processing 2023-01, Vol.71, p.1669-1682
Hauptverfasser:	Mouret, Florian, Hippert-Ferrer, Alexandre, Pascal, Frederic, Tourneret, Jean-Yves
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Angular Gaussian distributions Classification algorithms Clustering algorithms Computer Science elliptical distributions EM algorithm Estimation Finite element analysis Generators imputation Mathematics Missing data mixture models Outliers (statistics) Probabilistic models Robustness Signal processing algorithms Synthetic data Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or have non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate t-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2023.3267994