Fast Regularized Discrete Optimal Transport with Group-Sparse Regularizers
Regularized discrete optimal transport (OT) is a powerful tool to measure the distance between two discrete distributions that have been constructed from data samples on two different domains. While it has a wide range of applications in machine learning, in some cases the sampled data from only one...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Regularized discrete optimal transport (OT) is a powerful tool to measure the
distance between two discrete distributions that have been constructed from
data samples on two different domains. While it has a wide range of
applications in machine learning, in some cases the sampled data from only one
of the domains will have class labels such as unsupervised domain adaptation.
In this kind of problem setting, a group-sparse regularizer is frequently
leveraged as a regularization term to handle class labels. In particular, it
can preserve the label structure on the data samples by corresponding the data
samples with the same class label to one group-sparse regularization term. As a
result, we can measure the distance while utilizing label information by
solving the regularized optimization problem with gradient-based algorithms.
However, the gradient computation is expensive when the number of classes or
data samples is large because the number of regularization terms and their
respective sizes also turn out to be large. This paper proposes fast discrete
OT with group-sparse regularizers. Our method is based on two ideas. The first
is to safely skip the computations of the gradients that must be zero. The
second is to efficiently extract the gradients that are expected to be nonzero.
Our method is guaranteed to return the same value of the objective function as
that of the original method. Experiments show that our method is up to 8.6
times faster than the original method without degrading accuracy. |
---|---|
DOI: | 10.48550/arxiv.2303.07597 |