Closed-loop reasoning with graph-aware dense interaction for visual dialog
Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignme...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2022, Vol.28 (5), p.1823-1832 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1832 |
---|---|
container_issue | 5 |
container_start_page | 1823 |
container_title | Multimedia systems |
container_volume | 28 |
creator | Liu, An-An Zhang, Guokai Xu, Ning Guo, Junbo Jin, Guoqing Li, Xuanya |
description | Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model. |
doi_str_mv | 10.1007/s00530-022-00947-1 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2717709790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2717709790</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</originalsourceid><addsrcrecordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717709790</pqid></control><display><type>article</type><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><source>SpringerLink Journals - AutoHoldings</source><creator>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creator><creatorcontrib>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</creatorcontrib><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00947-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Ablation ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Multimedia Information Systems ; Operating Systems ; Reasoning ; Regular Paper ; Vision</subject><ispartof>Multimedia systems, 2022, Vol.28 (5), p.1823-1832</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</citedby><cites>FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</cites><orcidid>0000-0002-7526-4356</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00947-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00947-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</description><subject>Ablation</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Reasoning</subject><subject>Regular Paper</subject><subject>Vision</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKxDAURYMoOI7-gKuA6-h7Sdq0SxnUUQbc6DpkMmknQ21q0nHw741WcOfqweXc--AQcolwjQDqJgEUAhhwzgBqqRgekRlKwRlWFT8msxxyJuuSn5KzlHYAqEoBM_K06EJyG9aFMNDoTAq971t68OOWttEMW2YOJjq6cX1y1Peji8aOPvS0CZF--LQ3Hd1404X2nJw0pkvu4vfOyev93ctiyVbPD4-L2xWzAuuRcVw3KFXtAHkpJIIsUBbGVE1VlVZapzhHUZUVFrYxwliVIQ5KqrWrC74Wc3I17Q4xvO9dGvUu7GOfX2quUCmoVQ2Z4hNlY0gpukYP0b-Z-KkR9LczPTnT2Zn-caYxl8RUShnuWxf_pv9pfQEnKW15</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Liu, An-An</creator><creator>Zhang, Guokai</creator><creator>Xu, Ning</creator><creator>Guo, Junbo</creator><creator>Jin, Guoqing</creator><creator>Li, Xuanya</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></search><sort><creationdate>2022</creationdate><title>Closed-loop reasoning with graph-aware dense interaction for visual dialog</title><author>Liu, An-An ; Zhang, Guokai ; Xu, Ning ; Guo, Junbo ; Jin, Guoqing ; Li, Xuanya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-21bf1479e0126341045145aa8f886c4ce7221386815cfa3ac734120747be952b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Ablation</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Reasoning</topic><topic>Regular Paper</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, An-An</creatorcontrib><creatorcontrib>Zhang, Guokai</creatorcontrib><creatorcontrib>Xu, Ning</creatorcontrib><creatorcontrib>Guo, Junbo</creatorcontrib><creatorcontrib>Jin, Guoqing</creatorcontrib><creatorcontrib>Li, Xuanya</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, An-An</au><au>Zhang, Guokai</au><au>Xu, Ning</au><au>Guo, Junbo</au><au>Jin, Guoqing</au><au>Li, Xuanya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Closed-loop reasoning with graph-aware dense interaction for visual dialog</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2022</date><risdate>2022</risdate><volume>28</volume><issue>5</issue><spage>1823</spage><epage>1832</epage><pages>1823-1832</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Visual dialog is one attractive vision-language task to predict correct answer according to the given question, dialog history and image. Although researchers have offered diversified solutions to contact text with vision, multi-modal information still get inadequate interaction for semantic alignment. To solve the problem, we propose closed-loop reasoning with graph-aware dense interaction, aiming to discover cues through the dynamic structure of graph and leverage it to benefit dialog and image features. Moreover, we analyze the statistics of the linguistic entities hidden in dialog to prove the reliability of graph construction. Experiments are set up on two VisDial datasets, which indicate that our model achieves the competitive results against the previous methods. Ablation study and parameter analysis can further demonstrate the effectiveness of our model.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00947-1</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-7526-4356</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2022, Vol.28 (5), p.1823-1832 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_2717709790 |
source | SpringerLink Journals - AutoHoldings |
subjects | Ablation Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Multimedia Information Systems Operating Systems Reasoning Regular Paper Vision |
title | Closed-loop reasoning with graph-aware dense interaction for visual dialog |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T12%3A37%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Closed-loop%20reasoning%20with%20graph-aware%20dense%20interaction%20for%20visual%20dialog&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20An-An&rft.date=2022&rft.volume=28&rft.issue=5&rft.spage=1823&rft.epage=1832&rft.pages=1823-1832&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00947-1&rft_dat=%3Cproquest_cross%3E2717709790%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717709790&rft_id=info:pmid/&rfr_iscdi=true |