Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of artificial intelligence research 2020-12, Vol.69, p.1255-1285
Hauptverfasser:	Cardoso Pereira, Ricardo, Seoane Santos, Miriam, Pereira Rodrigues, Pedro, Henriques Abreu, Pedro
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Deep learning Machine learning Missing data Noise reduction Performance degradation Statistical methods Tables (data)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.
ISSN:	1076-9757 1076-9757 1943-5037
DOI:	10.1613/jair.1.12312