Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940
Hauptverfasser: Han, Ming, Yin, Hui, Chong, Aixin, Du, Qianqian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 7940
container_issue 17-18
container_start_page 7924
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 54
creator Han, Ming
Yin, Hui
Chong, Aixin
Du, Qianqian
description Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract
doi_str_mv 10.1007/s10489-024-05574-z
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3090096496</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3090096496</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3090096496</pqid></control><display><type>article</type><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><source>SpringerLink Journals - AutoHoldings</source><creator>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creator><creatorcontrib>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creatorcontrib><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &amp;Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05574-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Computer Science ; Correlation ; Feature extraction ; Lightweight ; Machines ; Manufacturing ; Matching ; Mechanical Engineering ; Processes ; Qualitative analysis ; Weight reduction</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</cites><orcidid>0000-0002-4226-4368</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05574-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05574-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &amp;Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Lightweight</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Matching</subject><subject>Mechanical Engineering</subject><subject>Processes</subject><subject>Qualitative analysis</subject><subject>Weight reduction</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Han, Ming</creator><creator>Yin, Hui</creator><creator>Chong, Aixin</creator><creator>Du, Qianqian</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></search><sort><creationdate>20240901</creationdate><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><author>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Lightweight</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Matching</topic><topic>Mechanical Engineering</topic><topic>Processes</topic><topic>Qualitative analysis</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Ming</au><au>Yin, Hui</au><au>Chong, Aixin</au><au>Du, Qianqian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>54</volume><issue>17-18</issue><spage>7924</spage><epage>7940</epage><pages>7924-7940</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &amp;Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05574-z</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_3090096496
source SpringerLink Journals - AutoHoldings
subjects Accuracy
Algorithms
Artificial Intelligence
Computer Science
Correlation
Feature extraction
Lightweight
Machines
Manufacturing
Matching
Mechanical Engineering
Processes
Qualitative analysis
Weight reduction
title Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T07%3A53%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhanced%20feature%20pyramid%20for%20multi-view%20stereo%20with%20adaptive%20correlation%20cost%20volume&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Han,%20Ming&rft.date=2024-09-01&rft.volume=54&rft.issue=17-18&rft.spage=7924&rft.epage=7940&rft.pages=7924-7940&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05574-z&rft_dat=%3Cproquest_cross%3E3090096496%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3090096496&rft_id=info:pmid/&rfr_iscdi=true