Learning A Disentangling Representation For PU Learning
In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we address the problem of learning a binary (positive vs.
negative) classifier given Positive and Unlabeled data commonly referred to as
PU learning. Although rudimentary techniques like clustering,
out-of-distribution detection, or positive density estimation can be used to
solve the problem in low-dimensional settings, their efficacy progressively
deteriorates with higher dimensions due to the increasing complexities in the
data distribution. In this paper we propose to learn a neural network-based
data representation using a loss function that can be used to project the
unlabeled data into two (positive and negative) clusters that can be easily
identified using simple clustering techniques, effectively emulating the
phenomenon observed in low-dimensional settings. We adopt a vector quantization
technique for the learned representations to amplify the separation between the
learned unlabeled data clusters. We conduct experiments on simulated PU data
that demonstrate the improved performance of our proposed method compared to
the current state-of-the-art approaches. We also provide some theoretical
justification for our two cluster-based approach and our algorithmic choices. |
---|---|
DOI: | 10.48550/arxiv.2310.03833 |