Closed-loop reasoning with graph-aware dense interaction for visual dialog

Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignme...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2022, Vol.28 (5), p.1823-1832
Hauptverfasser:	Liu, An-An, Zhang, Guokai, Xu, Ning, Guo, Junbo, Jin, Guoqing, Li, Xuanya
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Multimedia Information Systems Operating Systems Reasoning Regular Paper Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1832
container_issue	5
container_start_page	1823
container_title	Multimedia systems
container_volume	28
creator	Liu, An-An Zhang, Guokai Xu, Ning Guo, Junbo Jin, Guoqing Li, Xuanya
description	Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.
doi_str_mv	10.1007/s00530-022-00947-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709790</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</originalsourceid><addsrcrecordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709790</pqid></control><display><type>article</type><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><source>SpringerLink Journals - AutoHoldings</source><creator>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creator><creatorcontrib>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creatorcontrib><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00947-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Ablation ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Multimedia Information Systems ; Operating Systems ; Reasoning ; Regular Paper ; Vision</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1823-1832</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</citedby><cites>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</cites><orcidid>0000-0002-7526-4356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00947-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00947-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><subject>Ablation</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Reasoning</subject><subject>Regular Paper</subject><subject>Vision</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Liu, An-An</creator><creator>Zhang, Guokai</creator><creator>Xu, Ning</creator><creator>Guo, Junbo</creator><creator>Jin, Guoqing</creator><creator>Li, Xuanya</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></search><sort><creationdate>2022</creationdate><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><author>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Ablation</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Reasoning</topic><topic>Regular Paper</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, An-An</au><au>Zhang, Guokai</au><au>Xu, Ning</au><au>Guo, Junbo</au><au>Jin, Guoqing</au><au>Li, Xuanya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Closed-loop reasoning with graph-aware dense interaction for visual dialog</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1823</spage><epage>1832</epage><pages>1823-1832</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00947-1</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2022, Vol.28 (5), p.1823-1832
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_2717709790
source	SpringerLink Journals - AutoHoldings
subjects	Ablation Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Multimedia Information Systems Operating Systems Reasoning Regular Paper Vision
title	Closed-loop reasoning with graph-aware dense interaction for visual dialog
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T12%3A37%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Closed-loop%20reasoning%20with%20graph-aware%20dense%20interaction%20for%20visual%20dialog&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20An-An&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1823&rft.epage=1832&rft.pages=1823-1832&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00947-1&rft_dat=%3Cproquest_cross%3E2717709790%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709790&rft_id=info:pmid/&rfr_iscdi=true