MEEL: Multi-Modal Event Evolution Learning

Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tao, Zhengwei, Jin, Zhi, Huang, Junqiang, Chen, Xiancai, Bai, Xiaoying, Zhao, Haiyan, Zhang, Yifan, Tao, Chongyang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Tao, Zhengwei
Jin, Zhi
Huang, Junqiang
Chen, Xiancai
Bai, Xiaoying
Zhao, Haiyan
Zhang, Yifan
Tao, Chongyang
description Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.
doi_str_mv 10.48550/arxiv.2404.10429
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_10429</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_10429</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-d5ab7b0d227b0a1443ea5f6f8d27fc5967b7ead31add753e56dc4a64aa6c5ff3</originalsourceid><addsrcrecordid>eNotzjsPgjAYheEuDgb9AU4ym4ClV3AzBC8JxEF38kFb0wTBIBD99-Jlec928iC0CLDPQs7xGtqnHXzCMPMDzEg0RassSdKNm_VVZ72sUVC5yaDrbmxT9Z1tajfV0Na2vs7QxED10PP_Oui8Sy7xwUtP-2O8TT0QMvIUh0IWWBEyFgLGqAZuhAkVkabkkZCF1KBoAEpJTjUXqmQgGIAouTHUQcvf69ea31t7g_aVf8z510zf1aU7ig</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MEEL: Multi-Modal Event Evolution Learning</title><source>arXiv.org</source><creator>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</creator><creatorcontrib>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</creatorcontrib><description>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</description><identifier>DOI: 10.48550/arxiv.2404.10429</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.10429$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.10429$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tao, Zhengwei</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><creatorcontrib>Huang, Junqiang</creatorcontrib><creatorcontrib>Chen, Xiancai</creatorcontrib><creatorcontrib>Bai, Xiaoying</creatorcontrib><creatorcontrib>Zhao, Haiyan</creatorcontrib><creatorcontrib>Zhang, Yifan</creatorcontrib><creatorcontrib>Tao, Chongyang</creatorcontrib><title>MEEL: Multi-Modal Event Evolution Learning</title><description>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjsPgjAYheEuDgb9AU4ym4ClV3AzBC8JxEF38kFb0wTBIBD99-Jlec928iC0CLDPQs7xGtqnHXzCMPMDzEg0RassSdKNm_VVZ72sUVC5yaDrbmxT9Z1tajfV0Na2vs7QxED10PP_Oui8Sy7xwUtP-2O8TT0QMvIUh0IWWBEyFgLGqAZuhAkVkabkkZCF1KBoAEpJTjUXqmQgGIAouTHUQcvf69ea31t7g_aVf8z510zf1aU7ig</recordid><startdate>20240416</startdate><enddate>20240416</enddate><creator>Tao, Zhengwei</creator><creator>Jin, Zhi</creator><creator>Huang, Junqiang</creator><creator>Chen, Xiancai</creator><creator>Bai, Xiaoying</creator><creator>Zhao, Haiyan</creator><creator>Zhang, Yifan</creator><creator>Tao, Chongyang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240416</creationdate><title>MEEL: Multi-Modal Event Evolution Learning</title><author>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-d5ab7b0d227b0a1443ea5f6f8d27fc5967b7ead31add753e56dc4a64aa6c5ff3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Tao, Zhengwei</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><creatorcontrib>Huang, Junqiang</creatorcontrib><creatorcontrib>Chen, Xiancai</creatorcontrib><creatorcontrib>Bai, Xiaoying</creatorcontrib><creatorcontrib>Zhao, Haiyan</creatorcontrib><creatorcontrib>Zhang, Yifan</creatorcontrib><creatorcontrib>Tao, Chongyang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tao, Zhengwei</au><au>Jin, Zhi</au><au>Huang, Junqiang</au><au>Chen, Xiancai</au><au>Bai, Xiaoying</au><au>Zhao, Haiyan</au><au>Zhang, Yifan</au><au>Tao, Chongyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MEEL: Multi-Modal Event Evolution Learning</atitle><date>2024-04-16</date><risdate>2024</risdate><abstract>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</abstract><doi>10.48550/arxiv.2404.10429</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.10429
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_10429
source arXiv.org
subjects Computer Science - Artificial Intelligence
title MEEL: Multi-Modal Event Evolution Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T13%3A33%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MEEL:%20Multi-Modal%20Event%20Evolution%20Learning&rft.au=Tao,%20Zhengwei&rft.date=2024-04-16&rft_id=info:doi/10.48550/arxiv.2404.10429&rft_dat=%3Carxiv_GOX%3E2404_10429%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true