Closed-loop reasoning with graph-aware dense interaction for visual dialog

Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignme...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2022, Vol.28 (5), p.1823-1832
Hauptverfasser: Liu, An-An, Zhang, Guokai, Xu, Ning, Guo, Junbo, Jin, Guoqing, Li, Xuanya
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1832
container_issue 5
container_start_page 1823
container_title Multimedia systems
container_volume 28
creator Liu, An-An
Zhang, Guokai
Xu, Ning
Guo, Junbo
Jin, Guoqing
Li, Xuanya
description Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.
doi_str_mv 10.1007/s00530-022-00947-1
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709790</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</originalsourceid><addsrcrecordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709790</pqid></control><display><type>article</type><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><source>SpringerLink Journals - AutoHoldings</source><creator>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creator><creatorcontrib>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creatorcontrib><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00947-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Ablation ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Multimedia Information Systems ; Operating Systems ; Reasoning ; Regular Paper ; Vision</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1823-1832</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</citedby><cites>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</cites><orcidid>0000-0002-7526-4356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00947-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00947-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><subject>Ablation</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Reasoning</subject><subject>Regular Paper</subject><subject>Vision</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Liu, An-An</creator><creator>Zhang, Guokai</creator><creator>Xu, Ning</creator><creator>Guo, Junbo</creator><creator>Jin, Guoqing</creator><creator>Li, Xuanya</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></search><sort><creationdate>2022</creationdate><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><author>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Ablation</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Reasoning</topic><topic>Regular Paper</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, An-An</au><au>Zhang, Guokai</au><au>Xu, Ning</au><au>Guo, Junbo</au><au>Jin, Guoqing</au><au>Li, Xuanya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Closed-loop reasoning with graph-aware dense interaction for visual dialog</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1823</spage><epage>1832</epage><pages>1823-1832</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00947-1</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2022, Vol.28 (5), p.1823-1832
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_2717709790
source SpringerLink Journals - AutoHoldings
subjects Ablation
Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data Storage Representation
Multimedia Information Systems
Operating Systems
Reasoning
Regular Paper
Vision
title Closed-loop reasoning with graph-aware dense interaction for visual dialog
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T12%3A37%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Closed-loop%20reasoning%20with%20graph-aware%20dense%20interaction%20for%20visual%20dialog&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20An-An&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1823&rft.epage=1832&rft.pages=1823-1832&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00947-1&rft_dat=%3Cproquest_cross%3E2717709790%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709790&rft_id=info:pmid/&rfr_iscdi=true