Object-difference drived graph convolutional networks for visual question answering
Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest i...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2021-05, Vol.80 (11), p.16247-16265 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 16265 |
---|---|
container_issue | 11 |
container_start_page | 16247 |
container_title | Multimedia tools and applications |
container_volume | 80 |
creator | Zhu, Xi Mao, Zhendong Chen, Zhineng Li, Yangyang Wang, Zhaohui Wang, Bin |
description | Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods. |
doi_str_mv | 10.1007/s11042-020-08790-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2529006818</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2529006818</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-df238ba9d6c9104890496816ef524796f46033eee7adaebb284612eb0804b4533</originalsourceid><addsrcrecordid>eNp9UMFOwzAMjRBIjMEPcKrEOeAkbZMc0QQMadIOwDlKW2d0jKYk6yb-nmxF4sbFtmy_5-dHyDWDWwYg7yJjkHMKHCgoqVM8IRNWSEGl5Ow01UIBlQWwc3IR4xqAlQXPJ-RlWa2x3tKmdQ4DdjVmTWh32GSrYPv3rPbdzm-Gbes7u8k63O59-IiZ8yHbtXFIva8B42Gc2S7uMbTd6pKcObuJePWbp-Tt8eF1NqeL5dPz7H5Ba8F0Oum4UJXVTVnrpF5pyHWpWIkuKZO6dHkJQiCitI3FquIqLxnHChTkVV4IMSU3I28f_FGFWfshJJ3R8IJrgMSm0hYft-rgYwzoTB_aTxu-DQNzMM-M5plknjmaZyCBxAiK_eEjDH_U_6B-AHzFco8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529006818</pqid></control><display><type>article</type><title>Object-difference drived graph convolutional networks for visual question answering</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhu, Xi ; Mao, Zhendong ; Chen, Zhineng ; Li, Yangyang ; Wang, Zhaohui ; Wang, Bin</creator><creatorcontrib>Zhu, Xi ; Mao, Zhendong ; Chen, Zhineng ; Li, Yangyang ; Wang, Zhaohui ; Wang, Bin</creatorcontrib><description>Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-020-08790-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial intelligence ; Artificial neural networks ; Computer Communication Networks ; Computer Science ; Computer vision ; Critical point ; Data Structures and Information Theory ; Datasets ; Feature extraction ; Graphical representations ; Language ; Multimedia ; Multimedia Information Systems ; Natural language processing ; Object recognition ; Questions ; Redundancy ; Semantics ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2021-05, Vol.80 (11), p.16247-16265</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-df238ba9d6c9104890496816ef524796f46033eee7adaebb284612eb0804b4533</citedby><cites>FETCH-LOGICAL-c319t-df238ba9d6c9104890496816ef524796f46033eee7adaebb284612eb0804b4533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-020-08790-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-020-08790-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Zhu, Xi</creatorcontrib><creatorcontrib>Mao, Zhendong</creatorcontrib><creatorcontrib>Chen, Zhineng</creatorcontrib><creatorcontrib>Li, Yangyang</creatorcontrib><creatorcontrib>Wang, Zhaohui</creatorcontrib><creatorcontrib>Wang, Bin</creatorcontrib><title>Object-difference drived graph convolutional networks for visual question answering</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.</description><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Critical point</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Graphical representations</subject><subject>Language</subject><subject>Multimedia</subject><subject>Multimedia Information Systems</subject><subject>Natural language processing</subject><subject>Object recognition</subject><subject>Questions</subject><subject>Redundancy</subject><subject>Semantics</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9UMFOwzAMjRBIjMEPcKrEOeAkbZMc0QQMadIOwDlKW2d0jKYk6yb-nmxF4sbFtmy_5-dHyDWDWwYg7yJjkHMKHCgoqVM8IRNWSEGl5Ow01UIBlQWwc3IR4xqAlQXPJ-RlWa2x3tKmdQ4DdjVmTWh32GSrYPv3rPbdzm-Gbes7u8k63O59-IiZ8yHbtXFIva8B42Gc2S7uMbTd6pKcObuJePWbp-Tt8eF1NqeL5dPz7H5Ba8F0Oum4UJXVTVnrpF5pyHWpWIkuKZO6dHkJQiCitI3FquIqLxnHChTkVV4IMSU3I28f_FGFWfshJJ3R8IJrgMSm0hYft-rgYwzoTB_aTxu-DQNzMM-M5plknjmaZyCBxAiK_eEjDH_U_6B-AHzFco8</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Zhu, Xi</creator><creator>Mao, Zhendong</creator><creator>Chen, Zhineng</creator><creator>Li, Yangyang</creator><creator>Wang, Zhaohui</creator><creator>Wang, Bin</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20210501</creationdate><title>Object-difference drived graph convolutional networks for visual question answering</title><author>Zhu, Xi ; Mao, Zhendong ; Chen, Zhineng ; Li, Yangyang ; Wang, Zhaohui ; Wang, Bin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-df238ba9d6c9104890496816ef524796f46033eee7adaebb284612eb0804b4533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Critical point</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Graphical representations</topic><topic>Language</topic><topic>Multimedia</topic><topic>Multimedia Information Systems</topic><topic>Natural language processing</topic><topic>Object recognition</topic><topic>Questions</topic><topic>Redundancy</topic><topic>Semantics</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Xi</creatorcontrib><creatorcontrib>Mao, Zhendong</creatorcontrib><creatorcontrib>Chen, Zhineng</creatorcontrib><creatorcontrib>Li, Yangyang</creatorcontrib><creatorcontrib>Wang, Zhaohui</creatorcontrib><creatorcontrib>Wang, Bin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Xi</au><au>Mao, Zhendong</au><au>Chen, Zhineng</au><au>Li, Yangyang</au><au>Wang, Zhaohui</au><au>Wang, Bin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Object-difference drived graph convolutional networks for visual question answering</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2021-05-01</date><risdate>2021</risdate><volume>80</volume><issue>11</issue><spage>16247</spage><epage>16265</epage><pages>16247-16265</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-020-08790-0</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2021-05, Vol.80 (11), p.16247-16265 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2529006818 |
source | SpringerLink Journals - AutoHoldings |
subjects | Artificial intelligence Artificial neural networks Computer Communication Networks Computer Science Computer vision Critical point Data Structures and Information Theory Datasets Feature extraction Graphical representations Language Multimedia Multimedia Information Systems Natural language processing Object recognition Questions Redundancy Semantics Special Purpose and Application-Based Systems |
title | Object-difference drived graph convolutional networks for visual question answering |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T21%3A21%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Object-difference%20drived%20graph%20convolutional%20networks%20for%20visual%20question%20answering&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Zhu,%20Xi&rft.date=2021-05-01&rft.volume=80&rft.issue=11&rft.spage=16247&rft.epage=16265&rft.pages=16247-16265&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-020-08790-0&rft_dat=%3Cproquest_cross%3E2529006818%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529006818&rft_id=info:pmid/&rfr_iscdi=true |