MECOM: A Meta-Completion Network for Fine-Grained Recognition With Incomplete Multi-Modalities

Our work focuses on tackling the problem of fine-grained recognition with incomplete multi-modal data, which is overlooked by previous work in the literature. It is desirable to not only capture fine-grained patterns of objects but also alleviate the challenges of missing modalities for such a pract...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2024, Vol.33, p.3456-3469
Hauptverfasser:	Wei, Xiu-Shen, Yu, Hong-Tao, Xu, Anqi, Zhang, Faen, Peng, Yuxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Adaptation models Cognitive tasks Data models Datasets Decoupling Fine-grained recognition Hazards Learning meta-learning missing modalities Modal data multi-instance learning multi-modal learning Pattern recognition Predictive models Recognition Source code Task analysis Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Our work focuses on tackling the problem of fine-grained recognition with incomplete multi-modal data, which is overlooked by previous work in the literature. It is desirable to not only capture fine-grained patterns of objects but also alleviate the challenges of missing modalities for such a practical problem. In this paper, we propose to leverage a meta-learning strategy to learn model abilities of both fast modal adaptation and more importantly missing modality completion across a variety of incomplete multi-modality learning tasks. Based on that, we develop a meta-completion method, termed as MECOM, to perform multimodal fusion and explicit missing modality completion by our proposals of cross-modal attention and decoupling reconstruction. To further improve fine-grained recognition accuracy, an additional partial stream (as a counterpart of the main stream of MECOM, i.e., holistic) and the part-level features (corresponding to fine-grained objects' parts) selection are designed, which are tailored for fine-grained nature to capture discriminative but subtle part-level patterns. Comprehensive experiments from quantitative and qualitative aspects, as well as various ablation studies, on two fine-grained multimodal datasets and one generic multimodal dataset show our superiority over competing methods. Our code is open-source and available at https://github.com/SEU-VIPGroup/MECOM .
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2024.3403051