MEEL: Multi-Modal Event Evolution Learning

Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tao, Zhengwei, Jin, Zhi, Huang, Junqiang, Chen, Xiancai, Bai, Xiaoying, Zhao, Haiyan, Zhang, Yifan, Tao, Chongyang
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Tao, Zhengwei Jin, Zhi Huang, Junqiang Chen, Xiancai Bai, Xiaoying Zhao, Haiyan Zhang, Yifan Tao, Chongyang
description	Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.
doi_str_mv	10.48550/arxiv.2404.10429
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_10429</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_10429</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-d5ab7b0d227b0a1443ea5f6f8d27fc5967b7ead31add753e56dc4a64aa6c5ff3</originalsourceid><addsrcrecordid>eNotzjsPgjAYheEuDgb9AU4ym4ClV3AzBC8JxEF38kFb0wTBIBD99-Jlec928iC0CLDPQs7xGtqnHXzCMPMDzEg0RassSdKNm_VVZ72sUVC5yaDrbmxT9Z1tajfV0Na2vs7QxED10PP_Oui8Sy7xwUtP-2O8TT0QMvIUh0IWWBEyFgLGqAZuhAkVkabkkZCF1KBoAEpJTjUXqmQgGIAouTHUQcvf69ea31t7g_aVf8z510zf1aU7ig</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MEEL: Multi-Modal Event Evolution Learning</title><source>arXiv.org</source><creator>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</creator><creatorcontrib>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</creatorcontrib><description>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</description><identifier>DOI: 10.48550/arxiv.2404.10429</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.10429$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.10429$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tao, Zhengwei</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><creatorcontrib>Huang, Junqiang</creatorcontrib><creatorcontrib>Chen, Xiancai</creatorcontrib><creatorcontrib>Bai, Xiaoying</creatorcontrib><creatorcontrib>Zhao, Haiyan</creatorcontrib><creatorcontrib>Zhang, Yifan</creatorcontrib><creatorcontrib>Tao, Chongyang</creatorcontrib><title>MEEL: Multi-Modal Event Evolution Learning</title><description>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjsPgjAYheEuDgb9AU4ym4ClV3AzBC8JxEF38kFb0wTBIBD99-Jlec928iC0CLDPQs7xGtqnHXzCMPMDzEg0RassSdKNm_VVZ72sUVC5yaDrbmxT9Z1tajfV0Na2vs7QxED10PP_Oui8Sy7xwUtP-2O8TT0QMvIUh0IWWBEyFgLGqAZuhAkVkabkkZCF1KBoAEpJTjUXqmQgGIAouTHUQcvf69ea31t7g_aVf8z510zf1aU7ig</recordid><startdate>20240416</startdate><enddate>20240416</enddate><creator>Tao, Zhengwei</creator><creator>Jin, Zhi</creator><creator>Huang, Junqiang</creator><creator>Chen, Xiancai</creator><creator>Bai, Xiaoying</creator><creator>Zhao, Haiyan</creator><creator>Zhang, Yifan</creator><creator>Tao, Chongyang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240416</creationdate><title>MEEL: Multi-Modal Event Evolution Learning</title><author>Tao, Zhengwei ; Jin, Zhi ; Huang, Junqiang ; Chen, Xiancai ; Bai, Xiaoying ; Zhao, Haiyan ; Zhang, Yifan ; Tao, Chongyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-d5ab7b0d227b0a1443ea5f6f8d27fc5967b7ead31add753e56dc4a64aa6c5ff3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Tao, Zhengwei</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><creatorcontrib>Huang, Junqiang</creatorcontrib><creatorcontrib>Chen, Xiancai</creatorcontrib><creatorcontrib>Bai, Xiaoying</creatorcontrib><creatorcontrib>Zhao, Haiyan</creatorcontrib><creatorcontrib>Zhang, Yifan</creatorcontrib><creatorcontrib>Tao, Chongyang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tao, Zhengwei</au><au>Jin, Zhi</au><au>Huang, Junqiang</au><au>Chen, Xiancai</au><au>Bai, Xiaoying</au><au>Zhao, Haiyan</au><au>Zhang, Yifan</au><au>Tao, Chongyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MEEL: Multi-Modal Event Evolution Learning</atitle><date>2024-04-16</date><risdate>2024</risdate><abstract>Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism, yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.</abstract><doi>10.48550/arxiv.2404.10429</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.10429
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_10429
source	arXiv.org
subjects	Computer Science - Artificial Intelligence
title	MEEL: Multi-Modal Event Evolution Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T13%3A33%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MEEL:%20Multi-Modal%20Event%20Evolution%20Learning&rft.au=Tao,%20Zhengwei&rft.date=2024-04-16&rft_id=info:doi/10.48550/arxiv.2404.10429&rft_dat=%3Carxiv_GOX%3E2404_10429%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true