Efficient Heuristic Methods for Multimodal Fusion and Concept Fusion in Video Concept Detection

Semantic models are widely used to bridge the semantic gap between low-level features and high-level features in video concept indexing. Multimodal fusion and concept fusion are two commonly used approaches in building semantic models. In the previous work, domain adaptation is neglected in multimod...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2015-04, Vol.17 (4), p.498-511
Hauptverfasser:	Geng, Jie, Miao, Zhenjiang, Zhang, Xiao-Ping
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Concept fusion Correlation Detectors domain adaption Histograms Indexing multimodal fusion Semantics Vectors video concept indexing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic models are widely used to bridge the semantic gap between low-level features and high-level features in video concept indexing. Multimodal fusion and concept fusion are two commonly used approaches in building semantic models. In the previous work, domain adaptation is neglected in multimodal fusion, and many probability maximization based and unsupervised concept fusion methods are counterintuitive since they do not incorporate subjective human intuition. In this paper, we present a new two-stage semantic model combining the multimodal fusion and the concept fusion incorporating human heuristics. In the multimodal fusion model, we employ a new generic unsupervised method, namely, domain adaptive linear combination (DALC), to update the linear combination (LC) weights by incorporating the differences of element distributions between training and testing domains. In the concept fusion model, a novel mechanical node equilibrium (NE) model is developed by using forces to model the concept correlations to update the score of concepts represented by nodes. It is intuitive and can incorporate multiple kinds of correlations simultaneously to construct more sophisticated semantic structure. Compared to other state-of-the-art supervised and unsupervised methods, the new model can use either unsupervised or supervised factors to significantly improve the mean inferred average precision (MAP) performance on all datasets.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2015.2398195