MEEL: Multi-Modal Event Evolution Learning
Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-modal Event Reasoning (MMER) endeavors to endow machines with the
ability to comprehend intricate event relations across diverse data modalities.
MMER is fundamental and underlies a wide broad of applications. Despite
extensive instruction fine-tuning, current multi-modal large language models
still fall short in such ability. The disparity stems from that existing models
are insufficient to capture underlying principles governing event evolution in
various scenarios. In this paper, we introduce Multi-Modal Event Evolution
Learning (MEEL) to enable the model to grasp the event evolution mechanism,
yielding advanced MMER ability. Specifically, we commence with the design of
event diversification to gather seed events from a rich spectrum of scenarios.
Subsequently, we employ ChatGPT to generate evolving graphs for these seed
events. We propose an instruction encapsulation process that formulates the
evolving graphs into instruction-tuning data, aligning the comprehension of
event reasoning to humans. Finally, we observe that models trained in this way
are still struggling to fully comprehend event evolution. In such a case, we
propose the guiding discrimination strategy, in which models are trained to
discriminate the improper evolution direction. We collect and curate a
benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the
effectiveness of our approach, showcasing competitive performance in
open-source multi-modal LLMs. |
---|---|
DOI: | 10.48550/arxiv.2404.10429 |