Real-time monocular depth estimation with adaptive receptive fields

Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid arc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of real-time image processing 2021-08, Vol.18 (4), p.1369-1381
Hauptverfasser: Ji, Zhenyan, Song, Xiaojun, Guo, Xiaoxuan, Wang, Fangshi, Armendáriz-Iñigo, José Enrique
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1381
container_issue 4
container_start_page 1369
container_title Journal of real-time image processing
container_volume 18
creator Ji, Zhenyan
Song, Xiaojun
Guo, Xiaoxuan
Wang, Fangshi
Armendáriz-Iñigo, José Enrique
description Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.
doi_str_mv 10.1007/s11554-020-01036-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918676085</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918676085</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-6bfb572bdea277bdd852002ca14e2fa278532bfd229ef7c475c47d2c2f0d9daf3</originalsourceid><addsrcrecordid>eNp9UE1LxDAQDaLgWv0Dngqeo5PppmmPsvgFC4LoOaTJRLt025p0Ff-90YrePAwz85j3ZuYxdirgXACoiyiElEsOCBwEFCWHPbYQVSl4haLe_60BDtlRjBuAUpWFXLDVA5mOT-2W8u3QD3bXmZA7GqeXnGKCzdQOff7ept44M07tG-WBLM2Vb6lz8ZgdeNNFOvnJGXu6vnpc3fL1_c3d6nLNLSqYeNn4RipsHBlUqnGukuketEYsCX3CKllg4x1iTV7ZpZIpHFr04GpnfJGxs1l3DMPrLp2nN8Mu9Gmlxjo9qEpIEhnDecqGIcZAXo8h_RE-tAD9ZZaezdLJLP1tloZEKmZSTMP9M4U_6X9Yn1xgbU0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918676085</pqid></control><display><type>article</type><title>Real-time monocular depth estimation with adaptive receptive fields</title><source>SpringerLink Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Ji, Zhenyan ; Song, Xiaojun ; Guo, Xiaoxuan ; Wang, Fangshi ; Armendáriz-Iñigo, José Enrique</creator><creatorcontrib>Ji, Zhenyan ; Song, Xiaojun ; Guo, Xiaoxuan ; Wang, Fangshi ; Armendáriz-Iñigo, José Enrique</creatorcontrib><description>Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.</description><identifier>ISSN: 1861-8200</identifier><identifier>EISSN: 1861-8219</identifier><identifier>DOI: 10.1007/s11554-020-01036-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Algorithms ; Artificial intelligence ; Cameras ; Computer Graphics ; Computer Science ; Deep learning ; Dictionaries ; Image Processing and Computer Vision ; Machine learning ; Model accuracy ; Multimedia Information Systems ; Optimization ; Pattern Recognition ; Real time ; Signal,Image and Speech Processing ; Special Issue Paper</subject><ispartof>Journal of real-time image processing, 2021-08, Vol.18 (4), p.1369-1381</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-6bfb572bdea277bdd852002ca14e2fa278532bfd229ef7c475c47d2c2f0d9daf3</cites><orcidid>0000-0002-6566-9464</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11554-020-01036-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918676085?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21368,27903,27904,33723,41467,42536,43784,51297,64361,64365,72215</link.rule.ids></links><search><creatorcontrib>Ji, Zhenyan</creatorcontrib><creatorcontrib>Song, Xiaojun</creatorcontrib><creatorcontrib>Guo, Xiaoxuan</creatorcontrib><creatorcontrib>Wang, Fangshi</creatorcontrib><creatorcontrib>Armendáriz-Iñigo, José Enrique</creatorcontrib><title>Real-time monocular depth estimation with adaptive receptive fields</title><title>Journal of real-time image processing</title><addtitle>J Real-Time Image Proc</addtitle><description>Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Cameras</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Deep learning</subject><subject>Dictionaries</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Model accuracy</subject><subject>Multimedia Information Systems</subject><subject>Optimization</subject><subject>Pattern Recognition</subject><subject>Real time</subject><subject>Signal,Image and Speech Processing</subject><subject>Special Issue Paper</subject><issn>1861-8200</issn><issn>1861-8219</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9UE1LxDAQDaLgWv0Dngqeo5PppmmPsvgFC4LoOaTJRLt025p0Ff-90YrePAwz85j3ZuYxdirgXACoiyiElEsOCBwEFCWHPbYQVSl4haLe_60BDtlRjBuAUpWFXLDVA5mOT-2W8u3QD3bXmZA7GqeXnGKCzdQOff7ept44M07tG-WBLM2Vb6lz8ZgdeNNFOvnJGXu6vnpc3fL1_c3d6nLNLSqYeNn4RipsHBlUqnGukuketEYsCX3CKllg4x1iTV7ZpZIpHFr04GpnfJGxs1l3DMPrLp2nN8Mu9Gmlxjo9qEpIEhnDecqGIcZAXo8h_RE-tAD9ZZaezdLJLP1tloZEKmZSTMP9M4U_6X9Yn1xgbU0</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Ji, Zhenyan</creator><creator>Song, Xiaojun</creator><creator>Guo, Xiaoxuan</creator><creator>Wang, Fangshi</creator><creator>Armendáriz-Iñigo, José Enrique</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-6566-9464</orcidid></search><sort><creationdate>20210801</creationdate><title>Real-time monocular depth estimation with adaptive receptive fields</title><author>Ji, Zhenyan ; Song, Xiaojun ; Guo, Xiaoxuan ; Wang, Fangshi ; Armendáriz-Iñigo, José Enrique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-6bfb572bdea277bdd852002ca14e2fa278532bfd229ef7c475c47d2c2f0d9daf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Cameras</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Deep learning</topic><topic>Dictionaries</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Model accuracy</topic><topic>Multimedia Information Systems</topic><topic>Optimization</topic><topic>Pattern Recognition</topic><topic>Real time</topic><topic>Signal,Image and Speech Processing</topic><topic>Special Issue Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ji, Zhenyan</creatorcontrib><creatorcontrib>Song, Xiaojun</creatorcontrib><creatorcontrib>Guo, Xiaoxuan</creatorcontrib><creatorcontrib>Wang, Fangshi</creatorcontrib><creatorcontrib>Armendáriz-Iñigo, José Enrique</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Journal of real-time image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ji, Zhenyan</au><au>Song, Xiaojun</au><au>Guo, Xiaoxuan</au><au>Wang, Fangshi</au><au>Armendáriz-Iñigo, José Enrique</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-time monocular depth estimation with adaptive receptive fields</atitle><jtitle>Journal of real-time image processing</jtitle><stitle>J Real-Time Image Proc</stitle><date>2021-08-01</date><risdate>2021</risdate><volume>18</volume><issue>4</issue><spage>1369</spage><epage>1381</epage><pages>1369-1381</pages><issn>1861-8200</issn><eissn>1861-8219</eissn><abstract>Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11554-020-01036-0</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6566-9464</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1861-8200
ispartof Journal of real-time image processing, 2021-08, Vol.18 (4), p.1369-1381
issn 1861-8200
1861-8219
language eng
recordid cdi_proquest_journals_2918676085
source SpringerLink Journals; ProQuest Central UK/Ireland; ProQuest Central
subjects Accuracy
Algorithms
Artificial intelligence
Cameras
Computer Graphics
Computer Science
Deep learning
Dictionaries
Image Processing and Computer Vision
Machine learning
Model accuracy
Multimedia Information Systems
Optimization
Pattern Recognition
Real time
Signal,Image and Speech Processing
Special Issue Paper
title Real-time monocular depth estimation with adaptive receptive fields
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T15%3A57%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-time%20monocular%20depth%20estimation%20with%20adaptive%20receptive%20fields&rft.jtitle=Journal%20of%20real-time%20image%20processing&rft.au=Ji,%20Zhenyan&rft.date=2021-08-01&rft.volume=18&rft.issue=4&rft.spage=1369&rft.epage=1381&rft.pages=1369-1381&rft.issn=1861-8200&rft.eissn=1861-8219&rft_id=info:doi/10.1007/s11554-020-01036-0&rft_dat=%3Cproquest_cross%3E2918676085%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918676085&rft_id=info:pmid/&rfr_iscdi=true