Towards Overcoming False Positives in Visual Relationship Detection
In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we investigate the cause of the high false positive rate in
Visual Relationship Detection (VRD). We observe that during training, the
relationship proposal distribution is highly imbalanced: most of the negative
relationship proposals are easy to identify, e.g., the inaccurate object
detection, which leads to the under-fitting of low-frequency difficult
proposals. This paper presents Spatially-Aware Balanced negative pRoposal
sAmpling (SABRA), a robust VRD framework that alleviates the influence of false
positives. To effectively optimize the model under imbalanced distribution,
SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch
sampling. BNPS divides proposals into 5 well defined sub-classes and generates
a balanced training distribution according to the inverse frequency. BNPS gives
an easier optimization landscape and significantly reduces the number of false
positives. To further resolve the low-frequency challenging false positive
proposals with high spatial ambiguity, we improve the spatial modeling ability
of SABRA on two aspects: a simple and efficient multi-head heterogeneous graph
attention network (MH-GAT) that models the global spatial interactions of
objects, and a spatial mask decoder that learns the local spatial
configuration. SABRA outperforms SOTA methods by a large margin on two
human-object interaction (HOI) datasets and one general VRD dataset. |
---|---|
DOI: | 10.48550/arxiv.2012.12510 |