Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional superv...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning from positive and unlabeled data is known as positive-unlabeled (PU)
learning in literature and has attracted much attention in recent years. One
common approach in PU learning is to sample a set of pseudo-negatives from the
unlabeled data using ad-hoc thresholds so that conventional supervised methods
can be applied with both positive and negative samples. Owing to the label
uncertainty among the unlabeled data, errors of misclassifying unlabeled
positive samples as negative samples inevitably appear and may even accumulate
during the training processes. Those errors often lead to performance
degradation and model instability. To mitigate the impact of label uncertainty
and improve the robustness of learning with positive and unlabeled data, we
propose a new robust PU learning method with a training strategy motivated by
the nature of human learning: easy cases should be learned first. Similar
intuition has been utilized in curriculum learning to only use easier cases in
the early stage of training before introducing more complex cases.
Specifically, we utilize a novel ``hardness'' measure to distinguish unlabeled
samples with a high chance of being negative from unlabeled samples with large
label noise. An iterative training strategy is then implemented to fine-tune
the selection of negative samples during the training process in an iterative
manner to include more ``easy'' samples in the early stage of training.
Extensive experimental validations over a wide range of learning tasks show
that this approach can effectively improve the accuracy and stability of
learning with positive and unlabeled data. Our code is available at
https://github.com/woriazzc/Robust-PU |
---|---|
DOI: | 10.48550/arxiv.2308.00279 |