Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset

[Display omitted] •A novel framework for monocular depth estimation via a student–teacher strategy.•Introducing a data ensemble and stereo confidence-guided regression loss.•Constructing a new large-scale outdoor stereo dataset named the DIML/CVL dataset.•Demonstrating the feature representation of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2021-09, Vol.178, p.114877, Article 114877
Hauptverfasser:	Cho, Jaehoon, Min, Dongbo, Kim, Youngjung, Sohn, Kwanghoon
Format:	Artikel
Sprache:	eng
Schlagworte:	Convolutional neural network Datasets Monocular depth estimation Occlusion Outdoor stereo dataset Stereo confidence maps Student–teacher strategy Teachers Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	114877
container_title	Expert systems with applications
container_volume	178
creator	Cho, Jaehoon Min, Dongbo Kim, Youngjung Sohn, Kwanghoon
description	[Display omitted] •A novel framework for monocular depth estimation via a student–teacher strategy.•Introducing a data ensemble and stereo confidence-guided regression loss.•Constructing a new large-scale outdoor stereo dataset named the DIML/CVL dataset.•Demonstrating the feature representation of our trained-model for high-level tasks. Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student–teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting non-trivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.
doi_str_mv	10.1016/j.eswa.2021.114877
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2551249844</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421003183</els_id><sourcerecordid>2551249844</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-fdb37615ae5489183afc3a766f59a3a8d992ea12d7a821dd92b12eb27ffff53d3</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEuXxBzhZ4pzgRxLbEhdUnlKlXuBsufamJErjYDtF_HtclTN72cvM7syH0A0lJSW0uetLiN-mZITRktJKCnGCFlQKXjRC8VO0IKoWRUVFdY4uYuwJoYIQsUDrR4AJ7_zo7TyYgB1M6RNDTN3OpM6PeIA9BLPtxi02OCu2UERrBsB-Ts77gGOCAB47k0yEdIXOWjNEuP7bl-jj-el9-Vqs1i9vy4dVYTmTqWjdhouG1gbqSioquWktN6Jp2loZbqRTioGhzAkjGXVOsQ1lsGGizVNzxy_R7fHuFPzXnPPq3s9hzC81q2vKKiWrKqvYUWWDjzFAq6eQi4UfTYk-gNO9PoDTB3D6CC6b7o8myPn3HQQdbQejBdcFsEk73_1n_wVRQXe6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2551249844</pqid></control><display><type>article</type><title>Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Cho, Jaehoon ; Min, Dongbo ; Kim, Youngjung ; Sohn, Kwanghoon</creator><creatorcontrib>Cho, Jaehoon ; Min, Dongbo ; Kim, Youngjung ; Sohn, Kwanghoon</creatorcontrib><description>[Display omitted] •A novel framework for monocular depth estimation via a student–teacher strategy.•Introducing a data ensemble and stereo confidence-guided regression loss.•Constructing a new large-scale outdoor stereo dataset named the DIML/CVL dataset.•Demonstrating the feature representation of our trained-model for high-level tasks. Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student–teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting non-trivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.114877</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Convolutional neural network ; Datasets ; Monocular depth estimation ; Occlusion ; Outdoor stereo dataset ; Stereo confidence maps ; Student–teacher strategy ; Teachers ; Training</subject><ispartof>Expert systems with applications, 2021-09, Vol.178, p.114877, Article 114877</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Sep 15, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-fdb37615ae5489183afc3a766f59a3a8d992ea12d7a821dd92b12eb27ffff53d3</citedby><cites>FETCH-LOGICAL-c328t-fdb37615ae5489183afc3a766f59a3a8d992ea12d7a821dd92b12eb27ffff53d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2021.114877$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3541,27915,27916,45986</link.rule.ids></links><search><creatorcontrib>Cho, Jaehoon</creatorcontrib><creatorcontrib>Min, Dongbo</creatorcontrib><creatorcontrib>Kim, Youngjung</creatorcontrib><creatorcontrib>Sohn, Kwanghoon</creatorcontrib><title>Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset</title><title>Expert systems with applications</title><description>[Display omitted] •A novel framework for monocular depth estimation via a student–teacher strategy.•Introducing a data ensemble and stereo confidence-guided regression loss.•Constructing a new large-scale outdoor stereo dataset named the DIML/CVL dataset.•Demonstrating the feature representation of our trained-model for high-level tasks. Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student–teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting non-trivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.</description><subject>Convolutional neural network</subject><subject>Datasets</subject><subject>Monocular depth estimation</subject><subject>Occlusion</subject><subject>Outdoor stereo dataset</subject><subject>Stereo confidence maps</subject><subject>Student–teacher strategy</subject><subject>Teachers</subject><subject>Training</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEuXxBzhZ4pzgRxLbEhdUnlKlXuBsufamJErjYDtF_HtclTN72cvM7syH0A0lJSW0uetLiN-mZITRktJKCnGCFlQKXjRC8VO0IKoWRUVFdY4uYuwJoYIQsUDrR4AJ7_zo7TyYgB1M6RNDTN3OpM6PeIA9BLPtxi02OCu2UERrBsB-Ts77gGOCAB47k0yEdIXOWjNEuP7bl-jj-el9-Vqs1i9vy4dVYTmTqWjdhouG1gbqSioquWktN6Jp2loZbqRTioGhzAkjGXVOsQ1lsGGizVNzxy_R7fHuFPzXnPPq3s9hzC81q2vKKiWrKqvYUWWDjzFAq6eQi4UfTYk-gNO9PoDTB3D6CC6b7o8myPn3HQQdbQejBdcFsEk73_1n_wVRQXe6</recordid><startdate>20210915</startdate><enddate>20210915</enddate><creator>Cho, Jaehoon</creator><creator>Min, Dongbo</creator><creator>Kim, Youngjung</creator><creator>Sohn, Kwanghoon</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210915</creationdate><title>Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset</title><author>Cho, Jaehoon ; Min, Dongbo ; Kim, Youngjung ; Sohn, Kwanghoon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-fdb37615ae5489183afc3a766f59a3a8d992ea12d7a821dd92b12eb27ffff53d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Convolutional neural network</topic><topic>Datasets</topic><topic>Monocular depth estimation</topic><topic>Occlusion</topic><topic>Outdoor stereo dataset</topic><topic>Stereo confidence maps</topic><topic>Student–teacher strategy</topic><topic>Teachers</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cho, Jaehoon</creatorcontrib><creatorcontrib>Min, Dongbo</creatorcontrib><creatorcontrib>Kim, Youngjung</creatorcontrib><creatorcontrib>Sohn, Kwanghoon</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cho, Jaehoon</au><au>Min, Dongbo</au><au>Kim, Youngjung</au><au>Sohn, Kwanghoon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset</atitle><jtitle>Expert systems with applications</jtitle><date>2021-09-15</date><risdate>2021</risdate><volume>178</volume><spage>114877</spage><pages>114877-</pages><artnum>114877</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>[Display omitted] •A novel framework for monocular depth estimation via a student–teacher strategy.•Introducing a data ensemble and stereo confidence-guided regression loss.•Constructing a new large-scale outdoor stereo dataset named the DIML/CVL dataset.•Demonstrating the feature representation of our trained-model for high-level tasks. Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student–teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting non-trivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.114877</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2021-09, Vol.178, p.114877, Article 114877
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_2551249844
source	Elsevier ScienceDirect Journals Complete
subjects	Convolutional neural network Datasets Monocular depth estimation Occlusion Outdoor stereo dataset Stereo confidence maps Student–teacher strategy Teachers Training
title	Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T22%3A29%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20monocular%20depth%20estimation%20leveraging%20a%20large-scale%20outdoor%20stereo%20dataset&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Cho,%20Jaehoon&rft.date=2021-09-15&rft.volume=178&rft.spage=114877&rft.pages=114877-&rft.artnum=114877&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.114877&rft_dat=%3Cproquest_cross%3E2551249844%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2551249844&rft_id=info:pmid/&rft_els_id=S0957417421003183&rfr_iscdi=true