Adversarial Learning for Joint Optimization of Depth and Ego-Motion

In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2020-01, Vol.29, p.4130-4142
Hauptverfasser:	Wang, Anjie, Fang, Zhijun, Gao, Yongbin, Tan, Songchao, Wang, Shanshe, Ma, Siwei, Hwang, Jenq-Neng
Format:	Artikel
Sprache:	eng
Schlagworte:	adversarial learning Back propagation Cameras Deep learning Depth estimation direct visual odometry ego-motion Estimation Generators Geometric constraints Image reconstruction Machine learning Motion simulation Odometers Optimization Photometry self-supervised Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4142
container_issue
container_start_page	4130
container_title	IEEE transactions on image processing
container_volume	29
creator	Wang, Anjie Fang, Zhijun Gao, Yongbin Tan, Songchao Wang, Shanshe Ma, Siwei Hwang, Jenq-Neng
description	In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.
doi_str_mv	10.1109/TIP.2020.2968751
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2350340085</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8972902</ieee_id><sourcerecordid>2350340085</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-b1148fde270f1c74cfe98743a6deefa4e644900961757365f92438f30472a9403</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhoMotlbvgiALXrxsnXxtNsdS6xeVeqjnJd2d1JR2U5OtoL_eLa09eJph5nmH4SHkkkKfUtB30-e3PgMGfaazXEl6RLpUC5oCCHbc9iBVqqjQHXIW4wKACkmzU9LhDChlknXJcFB9YYgmOLNMxmhC7ep5Yn1IXryrm2SybtzK_ZjG-TrxNrnHdfORmLpKRnOfvvrt_JycWLOMeLGvPfL-MJoOn9Lx5PF5OBinJReqSWeUitxWyBRYWipRWtS5EtxkFaI1AjMhNIDOqJKKZ9JqJnhuOQjFjBbAe-R2d3cd_OcGY1OsXCxxuTQ1-k0sGJfABUAuW_TmH7rwm1C3320pkYHULGsp2FFl8DEGtMU6uJUJ3wWFYiu4aAUXW8HFXnAbud4f3sxWWB0Cf0Zb4GoHOEQ8rHOtmAbGfwGMWHve</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2354605926</pqid></control><display><type>article</type><title>Adversarial Learning for Joint Optimization of Depth and Ego-Motion</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Anjie ; Fang, Zhijun ; Gao, Yongbin ; Tan, Songchao ; Wang, Shanshe ; Ma, Siwei ; Hwang, Jenq-Neng</creator><creatorcontrib>Wang, Anjie ; Fang, Zhijun ; Gao, Yongbin ; Tan, Songchao ; Wang, Shanshe ; Ma, Siwei ; Hwang, Jenq-Neng</creatorcontrib><description>In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2020.2968751</identifier><identifier>PMID: 32011252</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>adversarial learning ; Back propagation ; Cameras ; Deep learning ; Depth estimation ; direct visual odometry ; ego-motion ; Estimation ; Generators ; Geometric constraints ; Image reconstruction ; Machine learning ; Motion simulation ; Odometers ; Optimization ; Photometry ; self-supervised ; Training</subject><ispartof>IEEE transactions on image processing, 2020-01, Vol.29, p.4130-4142</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-b1148fde270f1c74cfe98743a6deefa4e644900961757365f92438f30472a9403</citedby><cites>FETCH-LOGICAL-c347t-b1148fde270f1c74cfe98743a6deefa4e644900961757365f92438f30472a9403</cites><orcidid>0000-0001-8563-5678 ; 0000-0002-7665-7434 ; 0000-0002-2731-5403 ; 0000-0002-9486-1009</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8972902$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8972902$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32011252$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Anjie</creatorcontrib><creatorcontrib>Fang, Zhijun</creatorcontrib><creatorcontrib>Gao, Yongbin</creatorcontrib><creatorcontrib>Tan, Songchao</creatorcontrib><creatorcontrib>Wang, Shanshe</creatorcontrib><creatorcontrib>Ma, Siwei</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><title>Adversarial Learning for Joint Optimization of Depth and Ego-Motion</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.</description><subject>adversarial learning</subject><subject>Back propagation</subject><subject>Cameras</subject><subject>Deep learning</subject><subject>Depth estimation</subject><subject>direct visual odometry</subject><subject>ego-motion</subject><subject>Estimation</subject><subject>Generators</subject><subject>Geometric constraints</subject><subject>Image reconstruction</subject><subject>Machine learning</subject><subject>Motion simulation</subject><subject>Odometers</subject><subject>Optimization</subject><subject>Photometry</subject><subject>self-supervised</subject><subject>Training</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhoMotlbvgiALXrxsnXxtNsdS6xeVeqjnJd2d1JR2U5OtoL_eLa09eJph5nmH4SHkkkKfUtB30-e3PgMGfaazXEl6RLpUC5oCCHbc9iBVqqjQHXIW4wKACkmzU9LhDChlknXJcFB9YYgmOLNMxmhC7ep5Yn1IXryrm2SybtzK_ZjG-TrxNrnHdfORmLpKRnOfvvrt_JycWLOMeLGvPfL-MJoOn9Lx5PF5OBinJReqSWeUitxWyBRYWipRWtS5EtxkFaI1AjMhNIDOqJKKZ9JqJnhuOQjFjBbAe-R2d3cd_OcGY1OsXCxxuTQ1-k0sGJfABUAuW_TmH7rwm1C3320pkYHULGsp2FFl8DEGtMU6uJUJ3wWFYiu4aAUXW8HFXnAbud4f3sxWWB0Cf0Zb4GoHOEQ8rHOtmAbGfwGMWHve</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Wang, Anjie</creator><creator>Fang, Zhijun</creator><creator>Gao, Yongbin</creator><creator>Tan, Songchao</creator><creator>Wang, Shanshe</creator><creator>Ma, Siwei</creator><creator>Hwang, Jenq-Neng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8563-5678</orcidid><orcidid>https://orcid.org/0000-0002-7665-7434</orcidid><orcidid>https://orcid.org/0000-0002-2731-5403</orcidid><orcidid>https://orcid.org/0000-0002-9486-1009</orcidid></search><sort><creationdate>20200101</creationdate><title>Adversarial Learning for Joint Optimization of Depth and Ego-Motion</title><author>Wang, Anjie ; Fang, Zhijun ; Gao, Yongbin ; Tan, Songchao ; Wang, Shanshe ; Ma, Siwei ; Hwang, Jenq-Neng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-b1148fde270f1c74cfe98743a6deefa4e644900961757365f92438f30472a9403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>adversarial learning</topic><topic>Back propagation</topic><topic>Cameras</topic><topic>Deep learning</topic><topic>Depth estimation</topic><topic>direct visual odometry</topic><topic>ego-motion</topic><topic>Estimation</topic><topic>Generators</topic><topic>Geometric constraints</topic><topic>Image reconstruction</topic><topic>Machine learning</topic><topic>Motion simulation</topic><topic>Odometers</topic><topic>Optimization</topic><topic>Photometry</topic><topic>self-supervised</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Anjie</creatorcontrib><creatorcontrib>Fang, Zhijun</creatorcontrib><creatorcontrib>Gao, Yongbin</creatorcontrib><creatorcontrib>Tan, Songchao</creatorcontrib><creatorcontrib>Wang, Shanshe</creatorcontrib><creatorcontrib>Ma, Siwei</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Anjie</au><au>Fang, Zhijun</au><au>Gao, Yongbin</au><au>Tan, Songchao</au><au>Wang, Shanshe</au><au>Ma, Siwei</au><au>Hwang, Jenq-Neng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adversarial Learning for Joint Optimization of Depth and Ego-Motion</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2020-01-01</date><risdate>2020</risdate><volume>29</volume><spage>4130</spage><epage>4142</epage><pages>4130-4142</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>32011252</pmid><doi>10.1109/TIP.2020.2968751</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-8563-5678</orcidid><orcidid>https://orcid.org/0000-0002-7665-7434</orcidid><orcidid>https://orcid.org/0000-0002-2731-5403</orcidid><orcidid>https://orcid.org/0000-0002-9486-1009</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2020-01, Vol.29, p.4130-4142
issn	1057-7149 1941-0042
language	eng
recordid	cdi_proquest_miscellaneous_2350340085
source	IEEE Electronic Library (IEL)
subjects	adversarial learning Back propagation Cameras Deep learning Depth estimation direct visual odometry ego-motion Estimation Generators Geometric constraints Image reconstruction Machine learning Motion simulation Odometers Optimization Photometry self-supervised Training
title	Adversarial Learning for Joint Optimization of Depth and Ego-Motion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T02%3A52%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adversarial%20Learning%20for%20Joint%20Optimization%20of%20Depth%20and%20Ego-Motion&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Wang,%20Anjie&rft.date=2020-01-01&rft.volume=29&rft.spage=4130&rft.epage=4142&rft.pages=4130-4142&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2020.2968751&rft_dat=%3Cproquest_RIE%3E2350340085%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2354605926&rft_id=info:pmid/32011252&rft_ieee_id=8972902&rfr_iscdi=true