Learning Hierarchical Features for Scene Labeling

Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2013-08, Vol.35 (8), p.1915-1929
Hauptverfasser:	Farabet, C., Couprie, C., Najman, L., LeCun, Y.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Computer Science Computer Vision and Pattern Recognition Context Convolutional networks deep learning Feature extraction image classification Image edge detection Image segmentation Labeling scene parsing Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1929
container_issue	8
container_start_page	1915
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	35
creator	Farabet, C. Couprie, C. Najman, L. LeCun, Y.
description	Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.
doi_str_mv	10.1109/TPAMI.2012.231
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_00742077v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6338939</ieee_id><sourcerecordid>1370638754</sourcerecordid><originalsourceid>FETCH-LOGICAL-c500t-ae3f0a17409a37b01a096ad248c9a55a6debb29b66f58f044067b69b0de8ab6a3</originalsourceid><addsrcrecordid>eNo9kU1Lw0AQhhdRtFavXgTJUQ-psx_Zj2MpaoWIgvW8zKYTjaSN7raC_97Uak8DM8-8h-dl7IzDiHNw17On8cP9SAAXIyH5HhtwJ10uC-n22QC4Frm1wh6x45TeAbgqQB6yIyGNNVKpAeMlYVw2y9ds2lDEWL01FbbZLeFqHSlldRez54qWlJUYqO3BE3ZQY5vo9G8O2cvtzWwyzcvHu_vJuMyrAmCVI8kakBsFDqUJwBGcxrlQtnJYFKjnFIJwQeu6sDUoBdoE7QLMyWLQKIfsapv7hq3_iM0C47fvsPHTcek3OwCjBBjzxXv2cst-xO5zTWnlF02qqG1xSd06eS4NaGlNoXp0tEWr2KUUqd5lc_Abpf5Xqd8o9b3S_uHiL3sdFjTf4f8Oe-B8CzREtDtrKW1fhfwB13139A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1370638754</pqid></control><display><type>article</type><title>Learning Hierarchical Features for Scene Labeling</title><source>IEEE Electronic Library (IEL)</source><creator>Farabet, C. ; Couprie, C. ; Najman, L. ; LeCun, Y.</creator><creatorcontrib>Farabet, C. ; Couprie, C. ; Najman, L. ; LeCun, Y.</creatorcontrib><description>Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2012.231</identifier><identifier>PMID: 23787344</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Accuracy ; Computer Science ; Computer Vision and Pattern Recognition ; Context ; Convolutional networks ; deep learning ; Feature extraction ; image classification ; Image edge detection ; Image segmentation ; Labeling ; scene parsing ; Vectors</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2013-08, Vol.35 (8), p.1915-1929</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c500t-ae3f0a17409a37b01a096ad248c9a55a6debb29b66f58f044067b69b0de8ab6a3</citedby><cites>FETCH-LOGICAL-c500t-ae3f0a17409a37b01a096ad248c9a55a6debb29b66f58f044067b69b0de8ab6a3</cites><orcidid>0000-0002-6190-0235</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6338939$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6338939$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23787344$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-00742077$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Farabet, C.</creatorcontrib><creatorcontrib>Couprie, C.</creatorcontrib><creatorcontrib>Najman, L.</creatorcontrib><creatorcontrib>LeCun, Y.</creatorcontrib><title>Learning Hierarchical Features for Scene Labeling</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.</description><subject>Accuracy</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Context</subject><subject>Convolutional networks</subject><subject>deep learning</subject><subject>Feature extraction</subject><subject>image classification</subject><subject>Image edge detection</subject><subject>Image segmentation</subject><subject>Labeling</subject><subject>scene parsing</subject><subject>Vectors</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kU1Lw0AQhhdRtFavXgTJUQ-psx_Zj2MpaoWIgvW8zKYTjaSN7raC_97Uak8DM8-8h-dl7IzDiHNw17On8cP9SAAXIyH5HhtwJ10uC-n22QC4Frm1wh6x45TeAbgqQB6yIyGNNVKpAeMlYVw2y9ds2lDEWL01FbbZLeFqHSlldRez54qWlJUYqO3BE3ZQY5vo9G8O2cvtzWwyzcvHu_vJuMyrAmCVI8kakBsFDqUJwBGcxrlQtnJYFKjnFIJwQeu6sDUoBdoE7QLMyWLQKIfsapv7hq3_iM0C47fvsPHTcek3OwCjBBjzxXv2cst-xO5zTWnlF02qqG1xSd06eS4NaGlNoXp0tEWr2KUUqd5lc_Abpf5Xqd8o9b3S_uHiL3sdFjTf4f8Oe-B8CzREtDtrKW1fhfwB13139A</recordid><startdate>20130801</startdate><enddate>20130801</enddate><creator>Farabet, C.</creator><creator>Couprie, C.</creator><creator>Najman, L.</creator><creator>LeCun, Y.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-6190-0235</orcidid></search><sort><creationdate>20130801</creationdate><title>Learning Hierarchical Features for Scene Labeling</title><author>Farabet, C. ; Couprie, C. ; Najman, L. ; LeCun, Y.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c500t-ae3f0a17409a37b01a096ad248c9a55a6debb29b66f58f044067b69b0de8ab6a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Accuracy</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Context</topic><topic>Convolutional networks</topic><topic>deep learning</topic><topic>Feature extraction</topic><topic>image classification</topic><topic>Image edge detection</topic><topic>Image segmentation</topic><topic>Labeling</topic><topic>scene parsing</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Farabet, C.</creatorcontrib><creatorcontrib>Couprie, C.</creatorcontrib><creatorcontrib>Najman, L.</creatorcontrib><creatorcontrib>LeCun, Y.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Farabet, C.</au><au>Couprie, C.</au><au>Najman, L.</au><au>LeCun, Y.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Hierarchical Features for Scene Labeling</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2013-08-01</date><risdate>2013</risdate><volume>35</volume><issue>8</issue><spage>1915</spage><epage>1929</epage><pages>1915-1929</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>23787344</pmid><doi>10.1109/TPAMI.2012.231</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-6190-0235</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2013-08, Vol.35 (8), p.1915-1929
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_00742077v1
source	IEEE Electronic Library (IEL)
subjects	Accuracy Computer Science Computer Vision and Pattern Recognition Context Convolutional networks deep learning Feature extraction image classification Image edge detection Image segmentation Labeling scene parsing Vectors
title	Learning Hierarchical Features for Scene Labeling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T08%3A13%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Hierarchical%20Features%20for%20Scene%20Labeling&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Farabet,%20C.&rft.date=2013-08-01&rft.volume=35&rft.issue=8&rft.spage=1915&rft.epage=1929&rft.pages=1915-1929&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2012.231&rft_dat=%3Cproquest_RIE%3E1370638754%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1370638754&rft_id=info:pmid/23787344&rft_ieee_id=6338939&rfr_iscdi=true