Understanding image concepts using ISTOP model

This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. V...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2016-05, Vol.53, p.174-183
Hauptverfasser:	Zarchi, M.S., Tan, R.T., van Gemeren, C., Monadjemi, A., Veltkamp, R.C.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms And–Or graph Concept recognition Context sensitive grammar Grammars Graphs Image parsing Latent SVM Object recognition Pattern recognition Terminals Visual Visual term
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	183
container_issue
container_start_page	174
container_title	Pattern recognition
container_volume	53
creator	Zarchi, M.S. Tan, R.T. van Gemeren, C. Monadjemi, A. Veltkamp, R.C.
description	This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. Visual term is a higher concept level than the object level representing a few co-occurring objects. The grammar used in the model can be embodied in an And–Or graph representation. The hierarchical structure of the graph decomposes an image from the scene level into the visual term, object level and part level by terminal and non-terminal nodes, while the horizontal links in the graph impose the context and constraints between the nodes. In order to learn the grammar constraints and their weights, we propose an algorithm that can perform on weakly annotated datasets. This algorithm searches in the dataset to find visual terms without supervision and then learns the weights of the constraints using a latent SVM. The experimental results on the Pascal VOC dataset show that our model outperforms the state-of-the-art approaches in recognizing image concepts. •In understanding an image there is a significant gap between scene level and object level.•ISTOP model can parse an image form scene level to visual term, object and part level by context sensitive grammar.•The visual term is a new concept which can bridge the gap between scene level and object level.•The context used in the grammar can improve object detection as well as visual term detection.
doi_str_mv	10.1016/j.patcog.2015.11.010
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808069898</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0031320315004306</els_id><sourcerecordid>1808069898</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-db7f0506d52cd2f702beff474b4d3b15d46fed8c21503346a235a99ec8fbdd4c3</originalsourceid><addsrcrecordid>eNp9kEtrwzAQhEVpoWnaf9CDj73Y3ZXk16VQQh-BQApNzsKWVkbBsV3JKfTf18E997QwOzMwH2P3CAkCZo-HZKhG3TcJB0wTxAQQLtgCi1zEKUp-yRYAAmPBQVyzmxAOAJhPjwVL9p0hH8aqM65rInesGop032kaxhCdwllcf-62H9GxN9TesitbtYHu_u6S7V9fdqv3eLN9W6-eN7GWUI6xqXMLKWQm5dpwmwOvyVqZy1oaUWNqZGbJFJpjCkLIrOIircqSdGFrY6QWS_Yw9w6-_zpRGNXRBU1tW3XUn4LCAgrIyqIsJqucrdr3IXiyavDTDP-jENQZjzqoGY8641GIasIzxZ7mGE0zvh15FbSjabdxnvSoTO_-L_gFk7xvQw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808069898</pqid></control><display><type>article</type><title>Understanding image concepts using ISTOP model</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Zarchi, M.S. ; Tan, R.T. ; van Gemeren, C. ; Monadjemi, A. ; Veltkamp, R.C.</creator><creatorcontrib>Zarchi, M.S. ; Tan, R.T. ; van Gemeren, C. ; Monadjemi, A. ; Veltkamp, R.C.</creatorcontrib><description>This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. Visual term is a higher concept level than the object level representing a few co-occurring objects. The grammar used in the model can be embodied in an And–Or graph representation. The hierarchical structure of the graph decomposes an image from the scene level into the visual term, object level and part level by terminal and non-terminal nodes, while the horizontal links in the graph impose the context and constraints between the nodes. In order to learn the grammar constraints and their weights, we propose an algorithm that can perform on weakly annotated datasets. This algorithm searches in the dataset to find visual terms without supervision and then learns the weights of the constraints using a latent SVM. The experimental results on the Pascal VOC dataset show that our model outperforms the state-of-the-art approaches in recognizing image concepts. •In understanding an image there is a significant gap between scene level and object level.•ISTOP model can parse an image form scene level to visual term, object and part level by context sensitive grammar.•The visual term is a new concept which can bridge the gap between scene level and object level.•The context used in the grammar can improve object detection as well as visual term detection.</description><identifier>ISSN: 0031-3203</identifier><identifier>EISSN: 1873-5142</identifier><identifier>DOI: 10.1016/j.patcog.2015.11.010</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Algorithms ; And–Or graph ; Concept recognition ; Context sensitive grammar ; Grammars ; Graphs ; Image parsing ; Latent SVM ; Object recognition ; Pattern recognition ; Terminals ; Visual ; Visual term</subject><ispartof>Pattern recognition, 2016-05, Vol.53, p.174-183</ispartof><rights>2015 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-db7f0506d52cd2f702beff474b4d3b15d46fed8c21503346a235a99ec8fbdd4c3</citedby><cites>FETCH-LOGICAL-c409t-db7f0506d52cd2f702beff474b4d3b15d46fed8c21503346a235a99ec8fbdd4c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0031320315004306$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65534</link.rule.ids></links><search><creatorcontrib>Zarchi, M.S.</creatorcontrib><creatorcontrib>Tan, R.T.</creatorcontrib><creatorcontrib>van Gemeren, C.</creatorcontrib><creatorcontrib>Monadjemi, A.</creatorcontrib><creatorcontrib>Veltkamp, R.C.</creatorcontrib><title>Understanding image concepts using ISTOP model</title><title>Pattern recognition</title><description>This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. Visual term is a higher concept level than the object level representing a few co-occurring objects. The grammar used in the model can be embodied in an And–Or graph representation. The hierarchical structure of the graph decomposes an image from the scene level into the visual term, object level and part level by terminal and non-terminal nodes, while the horizontal links in the graph impose the context and constraints between the nodes. In order to learn the grammar constraints and their weights, we propose an algorithm that can perform on weakly annotated datasets. This algorithm searches in the dataset to find visual terms without supervision and then learns the weights of the constraints using a latent SVM. The experimental results on the Pascal VOC dataset show that our model outperforms the state-of-the-art approaches in recognizing image concepts. •In understanding an image there is a significant gap between scene level and object level.•ISTOP model can parse an image form scene level to visual term, object and part level by context sensitive grammar.•The visual term is a new concept which can bridge the gap between scene level and object level.•The context used in the grammar can improve object detection as well as visual term detection.</description><subject>Algorithms</subject><subject>And–Or graph</subject><subject>Concept recognition</subject><subject>Context sensitive grammar</subject><subject>Grammars</subject><subject>Graphs</subject><subject>Image parsing</subject><subject>Latent SVM</subject><subject>Object recognition</subject><subject>Pattern recognition</subject><subject>Terminals</subject><subject>Visual</subject><subject>Visual term</subject><issn>0031-3203</issn><issn>1873-5142</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kEtrwzAQhEVpoWnaf9CDj73Y3ZXk16VQQh-BQApNzsKWVkbBsV3JKfTf18E997QwOzMwH2P3CAkCZo-HZKhG3TcJB0wTxAQQLtgCi1zEKUp-yRYAAmPBQVyzmxAOAJhPjwVL9p0hH8aqM65rInesGop032kaxhCdwllcf-62H9GxN9TesitbtYHu_u6S7V9fdqv3eLN9W6-eN7GWUI6xqXMLKWQm5dpwmwOvyVqZy1oaUWNqZGbJFJpjCkLIrOIircqSdGFrY6QWS_Yw9w6-_zpRGNXRBU1tW3XUn4LCAgrIyqIsJqucrdr3IXiyavDTDP-jENQZjzqoGY8641GIasIzxZ7mGE0zvh15FbSjabdxnvSoTO_-L_gFk7xvQw</recordid><startdate>20160501</startdate><enddate>20160501</enddate><creator>Zarchi, M.S.</creator><creator>Tan, R.T.</creator><creator>van Gemeren, C.</creator><creator>Monadjemi, A.</creator><creator>Veltkamp, R.C.</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160501</creationdate><title>Understanding image concepts using ISTOP model</title><author>Zarchi, M.S. ; Tan, R.T. ; van Gemeren, C. ; Monadjemi, A. ; Veltkamp, R.C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-db7f0506d52cd2f702beff474b4d3b15d46fed8c21503346a235a99ec8fbdd4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>And–Or graph</topic><topic>Concept recognition</topic><topic>Context sensitive grammar</topic><topic>Grammars</topic><topic>Graphs</topic><topic>Image parsing</topic><topic>Latent SVM</topic><topic>Object recognition</topic><topic>Pattern recognition</topic><topic>Terminals</topic><topic>Visual</topic><topic>Visual term</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zarchi, M.S.</creatorcontrib><creatorcontrib>Tan, R.T.</creatorcontrib><creatorcontrib>van Gemeren, C.</creatorcontrib><creatorcontrib>Monadjemi, A.</creatorcontrib><creatorcontrib>Veltkamp, R.C.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zarchi, M.S.</au><au>Tan, R.T.</au><au>van Gemeren, C.</au><au>Monadjemi, A.</au><au>Veltkamp, R.C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Understanding image concepts using ISTOP model</atitle><jtitle>Pattern recognition</jtitle><date>2016-05-01</date><risdate>2016</risdate><volume>53</volume><spage>174</spage><epage>183</epage><pages>174-183</pages><issn>0031-3203</issn><eissn>1873-5142</eissn><abstract>This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. Visual term is a higher concept level than the object level representing a few co-occurring objects. The grammar used in the model can be embodied in an And–Or graph representation. The hierarchical structure of the graph decomposes an image from the scene level into the visual term, object level and part level by terminal and non-terminal nodes, while the horizontal links in the graph impose the context and constraints between the nodes. In order to learn the grammar constraints and their weights, we propose an algorithm that can perform on weakly annotated datasets. This algorithm searches in the dataset to find visual terms without supervision and then learns the weights of the constraints using a latent SVM. The experimental results on the Pascal VOC dataset show that our model outperforms the state-of-the-art approaches in recognizing image concepts. •In understanding an image there is a significant gap between scene level and object level.•ISTOP model can parse an image form scene level to visual term, object and part level by context sensitive grammar.•The visual term is a new concept which can bridge the gap between scene level and object level.•The context used in the grammar can improve object detection as well as visual term detection.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.patcog.2015.11.010</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0031-3203
ispartof	Pattern recognition, 2016-05, Vol.53, p.174-183
issn	0031-3203 1873-5142
language	eng
recordid	cdi_proquest_miscellaneous_1808069898
source	Elsevier ScienceDirect Journals Complete
subjects	Algorithms And–Or graph Concept recognition Context sensitive grammar Grammars Graphs Image parsing Latent SVM Object recognition Pattern recognition Terminals Visual Visual term
title	Understanding image concepts using ISTOP model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T21%3A17%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Understanding%20image%20concepts%20using%20ISTOP%20model&rft.jtitle=Pattern%20recognition&rft.au=Zarchi,%20M.S.&rft.date=2016-05-01&rft.volume=53&rft.spage=174&rft.epage=183&rft.pages=174-183&rft.issn=0031-3203&rft.eissn=1873-5142&rft_id=info:doi/10.1016/j.patcog.2015.11.010&rft_dat=%3Cproquest_cross%3E1808069898%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1808069898&rft_id=info:pmid/&rft_els_id=S0031320315004306&rfr_iscdi=true