Masking schemes for universal marginalisers
We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser (arXiv:1711.00695) in order to learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$, where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ i...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the effect of structure-agnostic and structure-dependent masking
schemes when training a universal marginaliser (arXiv:1711.00695) in order to
learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$,
where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some
arbitrary subset of all random variables of the generative model of interest.
In other words, we mimic the self-supervised training of a denoising
autoencoder, where a dataset of unlabelled data is used as partially observed
input and the neural approximator is optimised to minimise reconstruction loss.
We focus on studying the underlying process of the partially observed
data---how good is the neural approximator at learning all conditional
distributions when the observation process at prediction time differs from the
masking process during training? We compare networks trained with different
masking schemes in terms of their predictive performance and generalisation
properties. |
---|---|
DOI: | 10.48550/arxiv.2001.05895 |