Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video
We tackle the problem of learning concept classifiers from videos on the web without using manually labeled data. Although metadata attached to videos (e.g., video titles, descriptions) can be of help collecting training data for the target concept, the collected data is often very noisy. The main c...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We tackle the problem of learning concept classifiers from videos on the web
without using manually labeled data. Although metadata attached to videos
(e.g., video titles, descriptions) can be of help collecting training data for
the target concept, the collected data is often very noisy. The main challenge
is therefore how to select good examples from noisy training data. Previous
approaches firstly learn easy examples that are unlikely to be noise and then
gradually learn more complex examples. However, hard examples that are much
different from easy ones are never learned. In this paper, we propose an
approach called multimodal co-training (MMCo) for selecting good examples from
noisy training data. MMCo jointly learns classifiers for multiple modalities
that complement each other to select good examples. Since MMCo selects examples
by consensus of multimodal classifiers, a hard example for one modality can
still be used as a training example by exploiting the power of the other
modalities. The algorithm is very simple and easily implemented but yields
consistent and significant boosts in example selection and classification
performance on the FCVID and YouTube8M benchmarks. |
---|---|
DOI: | 10.48550/arxiv.1804.06057 |