Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940
Hauptverfasser:	Han, Ming, Yin, Hui, Chong, Aixin, Du, Qianqian
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Computer Science Correlation Feature extraction Lightweight Machines Manufacturing Matching Mechanical Engineering Processes Qualitative analysis Weight reduction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7940
container_issue	17-18
container_start_page	7924
container_title	Applied intelligence (Dordrecht, Netherlands)
container_volume	54
creator	Han, Ming Yin, Hui Chong, Aixin Du, Qianqian
description	Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract
doi_str_mv	10.1007/s10489-024-05574-z
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3090096496</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3090096496</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3090096496</pqid></control><display><type>article</type><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><source>SpringerLink Journals - AutoHoldings</source><creator>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creator><creatorcontrib>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creatorcontrib><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05574-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Computer Science ; Correlation ; Feature extraction ; Lightweight ; Machines ; Manufacturing ; Matching ; Mechanical Engineering ; Processes ; Qualitative analysis ; Weight reduction</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</cites><orcidid>0000-0002-4226-4368</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05574-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05574-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Lightweight</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Matching</subject><subject>Mechanical Engineering</subject><subject>Processes</subject><subject>Qualitative analysis</subject><subject>Weight reduction</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Han, Ming</creator><creator>Yin, Hui</creator><creator>Chong, Aixin</creator><creator>Du, Qianqian</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></search><sort><creationdate>20240901</creationdate><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><author>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Lightweight</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Matching</topic><topic>Mechanical Engineering</topic><topic>Processes</topic><topic>Qualitative analysis</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Ming</au><au>Yin, Hui</au><au>Chong, Aixin</au><au>Du, Qianqian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>54</volume><issue>17-18</issue><spage>7924</spage><epage>7940</epage><pages>7924-7940</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. Graphical abstract</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05574-z</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0924-669X
ispartof	Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940
issn	0924-669X 1573-7497
language	eng
recordid	cdi_proquest_journals_3090096496
source	SpringerLink Journals - AutoHoldings
subjects	Accuracy Algorithms Artificial Intelligence Computer Science Correlation Feature extraction Lightweight Machines Manufacturing Matching Mechanical Engineering Processes Qualitative analysis Weight reduction
title	Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T07%3A53%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhanced%20feature%20pyramid%20for%20multi-view%20stereo%20with%20adaptive%20correlation%20cost%20volume&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Han,%20Ming&rft.date=2024-09-01&rft.volume=54&rft.issue=17-18&rft.spage=7924&rft.epage=7940&rft.pages=7924-7940&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05574-z&rft_dat=%3Cproquest_cross%3E3090096496%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3090096496&rft_id=info:pmid/&rfr_iscdi=true