A Geometric Explanation of the Likelihood OOD Detection Paradox
Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despit...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling
behaviour: when trained on a relatively complex dataset, they assign higher
likelihood values to out-of-distribution (OOD) data from simpler sources.
Adding to the mystery, OOD samples are never generated by these DGMs despite
having higher likelihoods. This two-pronged paradox has yet to be conclusively
explained, making likelihood-based OOD detection unreliable. Our primary
observation is that high-likelihood regions will not be generated if they
contain minimal probability mass. We demonstrate how this seeming contradiction
of large densities yet low probability mass can occur around data confined to
low-dimensional manifolds. We also show that this scenario can be identified
through local intrinsic dimension (LID) estimation, and propose a method for
OOD detection which pairs the likelihoods and LID estimates obtained from a
pre-trained DGM. Our method can be applied to normalizing flows and score-based
diffusion models, and obtains results which match or surpass state-of-the-art
OOD detection benchmarks using the same DGM backbones. Our code is available at
https://github.com/layer6ai-labs/dgm_ood_detection. |
---|---|
DOI: | 10.48550/arxiv.2403.18910 |