Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation

To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2021-05, Vol.129 (5), p.1506-1525
Hauptverfasser:	Sun, Peng, Wu, Jiaxiang, Li, Songyuan, Lin, Peiwen, Huang, Junzhou, Li, Xi
Format:	Artikel
Sprache:	eng
Schlagworte:	Agglomeration Artificial Intelligence Computer Imaging Computer Science Image Processing and Computer Vision Image segmentation Pattern Recognition Pattern Recognition and Graphics Real time Searching Semantic segmentation Semantics Vision Weight reduction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1525
container_issue	5
container_start_page	1506
container_title	International journal of computer vision
container_volume	129
creator	Sun, Peng Wu, Jiaxiang Li, Songyuan Lin, Peiwen Huang, Junzhou Li, Xi
description	To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced to search for the optimal building blocks of networks automatically. However, the network depth, downsampling strategy, and feature aggregation method are still set in advance and nonadjustable during searching. Moreover, these key properties are highly correlated and essential for a remarkable real-time segmentation model. In this paper, we propose a joint search framework, called AutoRTNet, to automate all the aforementioned key properties in semantic segmentation. Specifically, we propose hyper-cells to jointly decide the network depth and the downsampling strategy via a novel cell-level pruning process. Furthermore, we propose an aggregation cell to achieve automatic multi-scale feature aggregation. Extensive experimental results on Cityscapes and CamVid datasets demonstrate that the proposed AutoRTNet achieves the new state-of-the-art trade-off between accuracy and speed. Notably, our AutoRTNet achieves 73.9% mIoU on Cityscapes and 110.0 FPS on an NVIDIA TitanXP GPU card with input images at a resolution of 768 × 1536 .
doi_str_mv	10.1007/s11263-021-01433-3
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2522240645</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A660897780</galeid><sourcerecordid>A660897780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-8c88d8489ea78b13ff994c141366c2a0076f1f551411c95024372a655bad7e8d3</originalsourceid><addsrcrecordid>eNp9kUtPJCEURsnESaZ1_AOzqsSViTg8CopadtRWJyYmPtYEqVs1mC5ogfLx70XLxLgxLCCXc-6FfAj9oeSQEtL8TZQyyTFhFBNac475D7SgouGY1kRsoQVpGcFCtvQX2k7pnhDCFOMLZK_ArPGNG6G6htH47Gw5DCP4bLILvnp0plpOOVTHsMn_D6rj8OSTGTdr54fqX3A-lxvr0htrfFetwOQpQrUchgjDe4_f6Gdv1gl2P_YddLs6uTk6wxeXp-dHywtsecsyVlapTtWqBdOoO8r7vm1rS2vKpbTMlG_KnvZClAq1rSCs5g0zUog70zWgOr6D9ua-mxgeJkhZ34cp-jJSM8EYq4msRaEOZ2owa9DO9yFHY8vqYHQ2eOhdqS-lJKptGkWKsP9FKEyG5zyYKSV9fn31lWUza2NIKUKvN9GNJr5oSvRbUnpOSpek9HtSmheJz1IqsB8gfr77G-sVMCqTzQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2522240645</pqid></control><display><type>article</type><title>Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation</title><source>Springer Nature - Complete Springer Journals</source><creator>Sun, Peng ; Wu, Jiaxiang ; Li, Songyuan ; Lin, Peiwen ; Huang, Junzhou ; Li, Xi</creator><creatorcontrib>Sun, Peng ; Wu, Jiaxiang ; Li, Songyuan ; Lin, Peiwen ; Huang, Junzhou ; Li, Xi</creatorcontrib><description>To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced to search for the optimal building blocks of networks automatically. However, the network depth, downsampling strategy, and feature aggregation method are still set in advance and nonadjustable during searching. Moreover, these key properties are highly correlated and essential for a remarkable real-time segmentation model. In this paper, we propose a joint search framework, called AutoRTNet, to automate all the aforementioned key properties in semantic segmentation. Specifically, we propose hyper-cells to jointly decide the network depth and the downsampling strategy via a novel cell-level pruning process. Furthermore, we propose an aggregation cell to achieve automatic multi-scale feature aggregation. Extensive experimental results on Cityscapes and CamVid datasets demonstrate that the proposed AutoRTNet achieves the new state-of-the-art trade-off between accuracy and speed. Notably, our AutoRTNet achieves 73.9% mIoU on Cityscapes and 110.0 FPS on an NVIDIA TitanXP GPU card with input images at a resolution of 768 × 1536 .</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-021-01433-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Agglomeration ; Artificial Intelligence ; Computer Imaging ; Computer Science ; Image Processing and Computer Vision ; Image segmentation ; Pattern Recognition ; Pattern Recognition and Graphics ; Real time ; Searching ; Semantic segmentation ; Semantics ; Vision ; Weight reduction</subject><ispartof>International journal of computer vision, 2021-05, Vol.129 (5), p.1506-1525</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021</rights><rights>COPYRIGHT 2021 Springer</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-8c88d8489ea78b13ff994c141366c2a0076f1f551411c95024372a655bad7e8d3</citedby><cites>FETCH-LOGICAL-c392t-8c88d8489ea78b13ff994c141366c2a0076f1f551411c95024372a655bad7e8d3</cites><orcidid>0000-0003-3023-1662</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-021-01433-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-021-01433-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Wu, Jiaxiang</creatorcontrib><creatorcontrib>Li, Songyuan</creatorcontrib><creatorcontrib>Lin, Peiwen</creatorcontrib><creatorcontrib>Huang, Junzhou</creatorcontrib><creatorcontrib>Li, Xi</creatorcontrib><title>Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced to search for the optimal building blocks of networks automatically. However, the network depth, downsampling strategy, and feature aggregation method are still set in advance and nonadjustable during searching. Moreover, these key properties are highly correlated and essential for a remarkable real-time segmentation model. In this paper, we propose a joint search framework, called AutoRTNet, to automate all the aforementioned key properties in semantic segmentation. Specifically, we propose hyper-cells to jointly decide the network depth and the downsampling strategy via a novel cell-level pruning process. Furthermore, we propose an aggregation cell to achieve automatic multi-scale feature aggregation. Extensive experimental results on Cityscapes and CamVid datasets demonstrate that the proposed AutoRTNet achieves the new state-of-the-art trade-off between accuracy and speed. Notably, our AutoRTNet achieves 73.9% mIoU on Cityscapes and 110.0 FPS on an NVIDIA TitanXP GPU card with input images at a resolution of 768 × 1536 .</description><subject>Agglomeration</subject><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Image Processing and Computer Vision</subject><subject>Image segmentation</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Real time</subject><subject>Searching</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Vision</subject><subject>Weight reduction</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kUtPJCEURsnESaZ1_AOzqsSViTg8CopadtRWJyYmPtYEqVs1mC5ogfLx70XLxLgxLCCXc-6FfAj9oeSQEtL8TZQyyTFhFBNac475D7SgouGY1kRsoQVpGcFCtvQX2k7pnhDCFOMLZK_ArPGNG6G6htH47Gw5DCP4bLILvnp0plpOOVTHsMn_D6rj8OSTGTdr54fqX3A-lxvr0htrfFetwOQpQrUchgjDe4_f6Gdv1gl2P_YddLs6uTk6wxeXp-dHywtsecsyVlapTtWqBdOoO8r7vm1rS2vKpbTMlG_KnvZClAq1rSCs5g0zUog70zWgOr6D9ua-mxgeJkhZ34cp-jJSM8EYq4msRaEOZ2owa9DO9yFHY8vqYHQ2eOhdqS-lJKptGkWKsP9FKEyG5zyYKSV9fn31lWUza2NIKUKvN9GNJr5oSvRbUnpOSpek9HtSmheJz1IqsB8gfr77G-sVMCqTzQ</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Sun, Peng</creator><creator>Wu, Jiaxiang</creator><creator>Li, Songyuan</creator><creator>Lin, Peiwen</creator><creator>Huang, Junzhou</creator><creator>Li, Xi</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-3023-1662</orcidid></search><sort><creationdate>20210501</creationdate><title>Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation</title><author>Sun, Peng ; Wu, Jiaxiang ; Li, Songyuan ; Lin, Peiwen ; Huang, Junzhou ; Li, Xi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-8c88d8489ea78b13ff994c141366c2a0076f1f551411c95024372a655bad7e8d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Agglomeration</topic><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Image Processing and Computer Vision</topic><topic>Image segmentation</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Real time</topic><topic>Searching</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Vision</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Wu, Jiaxiang</creatorcontrib><creatorcontrib>Li, Songyuan</creatorcontrib><creatorcontrib>Lin, Peiwen</creatorcontrib><creatorcontrib>Huang, Junzhou</creatorcontrib><creatorcontrib>Li, Xi</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Peng</au><au>Wu, Jiaxiang</au><au>Li, Songyuan</au><au>Lin, Peiwen</au><au>Huang, Junzhou</au><au>Li, Xi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2021-05-01</date><risdate>2021</risdate><volume>129</volume><issue>5</issue><spage>1506</spage><epage>1525</epage><pages>1506-1525</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced to search for the optimal building blocks of networks automatically. However, the network depth, downsampling strategy, and feature aggregation method are still set in advance and nonadjustable during searching. Moreover, these key properties are highly correlated and essential for a remarkable real-time segmentation model. In this paper, we propose a joint search framework, called AutoRTNet, to automate all the aforementioned key properties in semantic segmentation. Specifically, we propose hyper-cells to jointly decide the network depth and the downsampling strategy via a novel cell-level pruning process. Furthermore, we propose an aggregation cell to achieve automatic multi-scale feature aggregation. Extensive experimental results on Cityscapes and CamVid datasets demonstrate that the proposed AutoRTNet achieves the new state-of-the-art trade-off between accuracy and speed. Notably, our AutoRTNet achieves 73.9% mIoU on Cityscapes and 110.0 FPS on an NVIDIA TitanXP GPU card with input images at a resolution of 768 × 1536 .</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-021-01433-3</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0003-3023-1662</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-5691
ispartof	International journal of computer vision, 2021-05, Vol.129 (5), p.1506-1525
issn	0920-5691 1573-1405
language	eng
recordid	cdi_proquest_journals_2522240645
source	Springer Nature - Complete Springer Journals
subjects	Agglomeration Artificial Intelligence Computer Imaging Computer Science Image Processing and Computer Vision Image segmentation Pattern Recognition Pattern Recognition and Graphics Real time Searching Semantic segmentation Semantics Vision Weight reduction
title	Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T10%3A49%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-Time%20Semantic%20Segmentation%20via%20Auto%20Depth,%20Downsampling%20Joint%20Decision%20and%20Feature%20Aggregation&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Sun,%20Peng&rft.date=2021-05-01&rft.volume=129&rft.issue=5&rft.spage=1506&rft.epage=1525&rft.pages=1506-1525&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-021-01433-3&rft_dat=%3Cgale_proqu%3EA660897780%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2522240645&rft_id=info:pmid/&rft_galeid=A660897780&rfr_iscdi=true