Question difficulty estimation via enhanced directional modality association transformer

Estimating the difficulty of a question in video QAs is one of the important reasoning steps to answer the question. However, no previous question difficulty estimators consider the association between multiple modalities though video QA is intrinsically a multi-modal task involving both text and vi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-12, Vol.53 (23), p.28434-28445
Hauptverfasser:	Kim, Bong-Min, Park, Gyu-Min, Park, Seong-Bae
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer networks Computer Science Datasets Estimation Language Machines Manufacturing Mechanical Engineering Natural language Processes Questions Special Issue on IEA/AIE2022
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	28445
container_issue	23
container_start_page	28434
container_title	Applied intelligence (Dordrecht, Netherlands)
container_volume	53
creator	Kim, Bong-Min Park, Gyu-Min Park, Seong-Bae
description	Estimating the difficulty of a question in video QAs is one of the important reasoning steps to answer the question. However, no previous question difficulty estimators consider the association between multiple modalities though video QA is intrinsically a multi-modal task involving both text and video. To solve this problem, this paper proposes a novel question difficulty estimator using an enhanced directional modality attention transformer (DiMAT++). The proposed estimator adopts a CNN backbone network and a transformer to express a video modality and RoBERTa to express a text modality. However, these modalities are insufficient to classify the difficulty level of a question correctly, since they affect each other during performing video QAs. Therefore, in the proposed estimator, DiMAT++ captures directional associations from text modality to video modality and vice versa. DiMAT, the previous version of DiMAT++, does not represent the sequential information for each modality though it is designed to express the directional associations. Thus, DiMAT++ revises it to accept the sequential representations of each modality as its input. The effectiveness of the proposed estimator is verified with two benchmark video QA data sets. The experimental results indicate that the proposed estimator outperforms three baselines of heterogeneous attention mechanism (HAM), multi-modal fusion transformer (MMFT), and DiMAT, which proves that DiMAT++ is effective in improving the performance of video question difficulty estimation.
doi_str_mv	10.1007/s10489-023-04988-5
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2895064672</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2895064672</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-524c58b12ff58a317f1b16d9da01ea17bbf84a6a24884aba66eb4ebabb03f7fd3</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AVcF19GbR5N0KYMvGBBBYXbhpk20Q6cdkxnBf286Fdy5unD4zuGeQ8glg2sGoG8SA2kqClxQkJUxtDwiM1ZqQbWs9DGZQcUlVapanZKzlNYAIASwGVm97H3atUNfNG0Ibb3vdt_FqGzwoH61WPj-A_vaNxmJvh5l7IrN0GDXZhhTGup2oncR-xSGuPHxnJwE7JK_-L1z8nZ_97p4pMvnh6fF7ZLWXMOOllzWpXGMh1AaFEwH5phqqgaBeWTauWAkKuTS5OtQKe-kd-gciKBDI-bkasrdxuFz7GLXwz7mD5PlpipBSaV5pvhE1XFIKfpgtzFXjN-WgR0XtNOCNi9oDwvaMpvEZEoZ7t99_Iv-x_UDEdZ23w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2895064672</pqid></control><display><type>article</type><title>Question difficulty estimation via enhanced directional modality association transformer</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kim, Bong-Min ; Park, Gyu-Min ; Park, Seong-Bae</creator><creatorcontrib>Kim, Bong-Min ; Park, Gyu-Min ; Park, Seong-Bae</creatorcontrib><description>Estimating the difficulty of a question in video QAs is one of the important reasoning steps to answer the question. However, no previous question difficulty estimators consider the association between multiple modalities though video QA is intrinsically a multi-modal task involving both text and video. To solve this problem, this paper proposes a novel question difficulty estimator using an enhanced directional modality attention transformer (DiMAT++). The proposed estimator adopts a CNN backbone network and a transformer to express a video modality and RoBERTa to express a text modality. However, these modalities are insufficient to classify the difficulty level of a question correctly, since they affect each other during performing video QAs. Therefore, in the proposed estimator, DiMAT++ captures directional associations from text modality to video modality and vice versa. DiMAT, the previous version of DiMAT++, does not represent the sequential information for each modality though it is designed to express the directional associations. Thus, DiMAT++ revises it to accept the sequential representations of each modality as its input. The effectiveness of the proposed estimator is verified with two benchmark video QA data sets. The experimental results indicate that the proposed estimator outperforms three baselines of heterogeneous attention mechanism (HAM), multi-modal fusion transformer (MMFT), and DiMAT, which proves that DiMAT++ is effective in improving the performance of video question difficulty estimation.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-023-04988-5</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Computer networks ; Computer Science ; Datasets ; Estimation ; Language ; Machines ; Manufacturing ; Mechanical Engineering ; Natural language ; Processes ; Questions ; Special Issue on IEA/AIE2022</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2023-12, Vol.53 (23), p.28434-28445</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-524c58b12ff58a317f1b16d9da01ea17bbf84a6a24884aba66eb4ebabb03f7fd3</cites><orcidid>0000-0002-6453-0348 ; 0000-0002-4404-7754</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-023-04988-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-023-04988-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Kim, Bong-Min</creatorcontrib><creatorcontrib>Park, Gyu-Min</creatorcontrib><creatorcontrib>Park, Seong-Bae</creatorcontrib><title>Question difficulty estimation via enhanced directional modality association transformer</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Estimating the difficulty of a question in video QAs is one of the important reasoning steps to answer the question. However, no previous question difficulty estimators consider the association between multiple modalities though video QA is intrinsically a multi-modal task involving both text and video. To solve this problem, this paper proposes a novel question difficulty estimator using an enhanced directional modality attention transformer (DiMAT++). The proposed estimator adopts a CNN backbone network and a transformer to express a video modality and RoBERTa to express a text modality. However, these modalities are insufficient to classify the difficulty level of a question correctly, since they affect each other during performing video QAs. Therefore, in the proposed estimator, DiMAT++ captures directional associations from text modality to video modality and vice versa. DiMAT, the previous version of DiMAT++, does not represent the sequential information for each modality though it is designed to express the directional associations. Thus, DiMAT++ revises it to accept the sequential representations of each modality as its input. The effectiveness of the proposed estimator is verified with two benchmark video QA data sets. The experimental results indicate that the proposed estimator outperforms three baselines of heterogeneous attention mechanism (HAM), multi-modal fusion transformer (MMFT), and DiMAT, which proves that DiMAT++ is effective in improving the performance of video question difficulty estimation.</description><subject>Artificial Intelligence</subject><subject>Computer networks</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Estimation</subject><subject>Language</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Natural language</subject><subject>Processes</subject><subject>Questions</subject><subject>Special Issue on IEA/AIE2022</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kEtLxDAUhYMoOI7-AVcF19GbR5N0KYMvGBBBYXbhpk20Q6cdkxnBf286Fdy5unD4zuGeQ8glg2sGoG8SA2kqClxQkJUxtDwiM1ZqQbWs9DGZQcUlVapanZKzlNYAIASwGVm97H3atUNfNG0Ibb3vdt_FqGzwoH61WPj-A_vaNxmJvh5l7IrN0GDXZhhTGup2oncR-xSGuPHxnJwE7JK_-L1z8nZ_97p4pMvnh6fF7ZLWXMOOllzWpXGMh1AaFEwH5phqqgaBeWTauWAkKuTS5OtQKe-kd-gciKBDI-bkasrdxuFz7GLXwz7mD5PlpipBSaV5pvhE1XFIKfpgtzFXjN-WgR0XtNOCNi9oDwvaMpvEZEoZ7t99_Iv-x_UDEdZ23w</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Kim, Bong-Min</creator><creator>Park, Gyu-Min</creator><creator>Park, Seong-Bae</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-6453-0348</orcidid><orcidid>https://orcid.org/0000-0002-4404-7754</orcidid></search><sort><creationdate>20231201</creationdate><title>Question difficulty estimation via enhanced directional modality association transformer</title><author>Kim, Bong-Min ; Park, Gyu-Min ; Park, Seong-Bae</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-524c58b12ff58a317f1b16d9da01ea17bbf84a6a24884aba66eb4ebabb03f7fd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial Intelligence</topic><topic>Computer networks</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Estimation</topic><topic>Language</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Natural language</topic><topic>Processes</topic><topic>Questions</topic><topic>Special Issue on IEA/AIE2022</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Bong-Min</creatorcontrib><creatorcontrib>Park, Gyu-Min</creatorcontrib><creatorcontrib>Park, Seong-Bae</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Bong-Min</au><au>Park, Gyu-Min</au><au>Park, Seong-Bae</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Question difficulty estimation via enhanced directional modality association transformer</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>53</volume><issue>23</issue><spage>28434</spage><epage>28445</epage><pages>28434-28445</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Estimating the difficulty of a question in video QAs is one of the important reasoning steps to answer the question. However, no previous question difficulty estimators consider the association between multiple modalities though video QA is intrinsically a multi-modal task involving both text and video. To solve this problem, this paper proposes a novel question difficulty estimator using an enhanced directional modality attention transformer (DiMAT++). The proposed estimator adopts a CNN backbone network and a transformer to express a video modality and RoBERTa to express a text modality. However, these modalities are insufficient to classify the difficulty level of a question correctly, since they affect each other during performing video QAs. Therefore, in the proposed estimator, DiMAT++ captures directional associations from text modality to video modality and vice versa. DiMAT, the previous version of DiMAT++, does not represent the sequential information for each modality though it is designed to express the directional associations. Thus, DiMAT++ revises it to accept the sequential representations of each modality as its input. The effectiveness of the proposed estimator is verified with two benchmark video QA data sets. The experimental results indicate that the proposed estimator outperforms three baselines of heterogeneous attention mechanism (HAM), multi-modal fusion transformer (MMFT), and DiMAT, which proves that DiMAT++ is effective in improving the performance of video question difficulty estimation.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-023-04988-5</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-6453-0348</orcidid><orcidid>https://orcid.org/0000-0002-4404-7754</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0924-669X
ispartof	Applied intelligence (Dordrecht, Netherlands), 2023-12, Vol.53 (23), p.28434-28445
issn	0924-669X 1573-7497
language	eng
recordid	cdi_proquest_journals_2895064672
source	SpringerLink Journals - AutoHoldings
subjects	Artificial Intelligence Computer networks Computer Science Datasets Estimation Language Machines Manufacturing Mechanical Engineering Natural language Processes Questions Special Issue on IEA/AIE2022
title	Question difficulty estimation via enhanced directional modality association transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T13%3A11%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Question%20difficulty%20estimation%20via%20enhanced%20directional%20modality%20association%20transformer&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Kim,%20Bong-Min&rft.date=2023-12-01&rft.volume=53&rft.issue=23&rft.spage=28434&rft.epage=28445&rft.pages=28434-28445&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-023-04988-5&rft_dat=%3Cproquest_cross%3E2895064672%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2895064672&rft_id=info:pmid/&rfr_iscdi=true