Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume
Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 7940 |
---|---|
container_issue | 17-18 |
container_start_page | 7924 |
container_title | Applied intelligence (Dordrecht, Netherlands) |
container_volume | 54 |
creator | Han, Ming Yin, Hui Chong, Aixin Du, Qianqian |
description | Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.
Graphical abstract |
doi_str_mv | 10.1007/s10489-024-05574-z |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3090096496</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3090096496</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3090096496</pqid></control><display><type>article</type><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><source>SpringerLink Journals - AutoHoldings</source><creator>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creator><creatorcontrib>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</creatorcontrib><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.
Graphical abstract</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05574-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Computer Science ; Correlation ; Feature extraction ; Lightweight ; Machines ; Manufacturing ; Matching ; Mechanical Engineering ; Processes ; Qualitative analysis ; Weight reduction</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</cites><orcidid>0000-0002-4226-4368</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05574-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05574-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.
Graphical abstract</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Lightweight</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Matching</subject><subject>Mechanical Engineering</subject><subject>Processes</subject><subject>Qualitative analysis</subject><subject>Weight reduction</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWD_-gKcFz9FJsrvZPUqpH1DwUsFbSLMTu2W_TLIt7a83uoI3TzPDvO87w0PIDYM7BiDvPYO0KCnwlEKWyZQeT8iMZVJQmZbylMygjKs8L9_PyYX3WwAQAtiMrBbdRncGq8SiDqPDZDg43dZx7l3Sjk2o6a7GfeIDOuyTfR02ia70EOodJqZ3Dhsd6r6LvQ_Jrm_GFq_ImdWNx-vfekneHher-TNdvj69zB-W1HCAQPmac64tGGuEYXK9Bs1AVwUiQyMqW6GMf9oqz1CWmc2N5kUBaZ4WwmorQVyS2yl3cP3niD6obT-6Lp5UAkqAMk_LPKr4pDKu996hVYOrW-0OioH6pqcmeirSUz_01DGaxGTyUdx9oPuL_sf1BUUddPo</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Han, Ming</creator><creator>Yin, Hui</creator><creator>Chong, Aixin</creator><creator>Du, Qianqian</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></search><sort><creationdate>20240901</creationdate><title>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</title><author>Han, Ming ; Yin, Hui ; Chong, Aixin ; Du, Qianqian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-2b222af0cfc3c17bb0a10ad8ee1ec3dfde7000fd65e795f6ca288046483faf703</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Lightweight</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Matching</topic><topic>Mechanical Engineering</topic><topic>Processes</topic><topic>Qualitative analysis</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Ming</creatorcontrib><creatorcontrib>Yin, Hui</creatorcontrib><creatorcontrib>Chong, Aixin</creatorcontrib><creatorcontrib>Du, Qianqian</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Ming</au><au>Yin, Hui</au><au>Chong, Aixin</au><au>Du, Qianqian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>54</volume><issue>17-18</issue><spage>7924</spage><epage>7940</epage><pages>7924-7940</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods.
Graphical abstract</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05574-z</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-4226-4368</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0924-669X |
ispartof | Applied intelligence (Dordrecht, Netherlands), 2024-09, Vol.54 (17-18), p.7924-7940 |
issn | 0924-669X 1573-7497 |
language | eng |
recordid | cdi_proquest_journals_3090096496 |
source | SpringerLink Journals - AutoHoldings |
subjects | Accuracy Algorithms Artificial Intelligence Computer Science Correlation Feature extraction Lightweight Machines Manufacturing Matching Mechanical Engineering Processes Qualitative analysis Weight reduction |
title | Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T07%3A53%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhanced%20feature%20pyramid%20for%20multi-view%20stereo%20with%20adaptive%20correlation%20cost%20volume&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Han,%20Ming&rft.date=2024-09-01&rft.volume=54&rft.issue=17-18&rft.spage=7924&rft.epage=7940&rft.pages=7924-7940&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05574-z&rft_dat=%3Cproquest_cross%3E3090096496%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3090096496&rft_id=info:pmid/&rfr_iscdi=true |