A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses
Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked i...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, substantial research efforts in Deep Metric Learning (DML) focused
on designing complex pairwise-distance losses, which require convoluted schemes
to ease optimization, such as sample mining or pair weighting. The standard
cross-entropy loss for classification has been largely overlooked in DML. On
the surface, the cross-entropy may seem unrelated and irrelevant to metric
learning as it does not explicitly involve pairwise distances. However, we
provide a theoretical analysis that links the cross-entropy to several
well-known and recent pairwise losses. Our connections are drawn from two
different perspectives: one based on an explicit optimization insight; the
other on discriminative and generative views of the mutual information between
the labels and the learned features. First, we explicitly demonstrate that the
cross-entropy is an upper bound on a new pairwise loss, which has a structure
similar to various pairwise losses: it minimizes intra-class distances while
maximizing inter-class distances. As a result, minimizing the cross-entropy can
be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm
for minimizing this pairwise loss. Second, we show that, more generally,
minimizing the cross-entropy is actually equivalent to maximizing the mutual
information, to which we connect several well-known pairwise losses.
Furthermore, we show that various standard pairwise losses can be explicitly
related to one another via bound relationships. Our findings indicate that the
cross-entropy represents a proxy for maximizing the mutual information -- as
pairwise losses do -- without the need for convoluted sample-mining heuristics.
Our experiments over four standard DML benchmarks strongly support our
findings. We obtain state-of-the-art results, outperforming recent and complex
DML methods. |
---|---|
DOI: | 10.48550/arxiv.2003.08983 |