Extended conceptual feedback for semantic multimedia indexing

In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2015-02, Vol.74 (4), p.1225-1248
Hauptverfasser: Hamadi, Abdelkader, Mulhem, Philippe, Quénot, Georges
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are used. Though direct multi-label approaches are considered in some works, detection scores are often computed independently for each target concept. We propose a method that we call “conceptual feedback” which implicitly takes into account the relations between concepts to improve the overall concepts detection performance. A conceptual descriptor is built from the system’s output scores and fed back by adding it to the pool of already available descriptors. Our proposal can be iterated several times. Moreover, we propose three extensions of our method. Firstly, a weighting of the conceptual dimensions is performed to give more importance to concepts which are more correlated to the target concept. Secondly, an explicit selection of a set of concepts that are semantically or statically related to the target concept is introduced. For video indexing, we propose a third extension which integrates the temporal dimension in the feedback process by taking into account simultaneously the conceptual and the temporal dimensions to build the high-level descriptor. Our proposals have been evaluated in the context of the TRECVid 2012 semantic indexing task involving the detection of 346 visual or multi-modal concepts. Overall, combined with temporal re-scoring, the proposed method increased the global system performance (MAP) from 0.2613 to 0.3082 ( + 17.9 % of relative improvement) while the temporal re-scoring alone increased it only from 0.2613 to 0.2691 ( + 3.0 %).
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-014-1937-y