MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D proje...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cheng, Jintao, Chen, Xingming, Liang, Jinxin, Tang, Xiaoyu, Chen, Xieyuanli, Li, Dachuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Cheng, Jintao Chen, Xingming Liang, Jinxin Tang, Xiaoyu Chen, Xieyuanli Li, Dachuan
description	Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.
doi_str_mv	10.48550/arxiv.2408.10602
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_10602</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_10602</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_106023</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DM0MDMw4mRw9g3T9fUPtlLwLc0pydQNy0wtV3BLTSwpLUpVcCstzszPU0jLL1IwdlHwzS_LzEtX8E_KSk0uUQhOTc9NzStJLAGq4GFgTUvMKU7lhdLcDPJuriHOHrpg6-ILijJzE4sq40HWxoOtNSasAgDajjY7</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><source>arXiv.org</source><creator>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</creator><creatorcontrib>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</creatorcontrib><description>Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.</description><identifier>DOI: 10.48550/arxiv.2408.10602</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.10602$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.10602$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cheng, Jintao</creatorcontrib><creatorcontrib>Chen, Xingming</creatorcontrib><creatorcontrib>Liang, Jinxin</creatorcontrib><creatorcontrib>Tang, Xiaoyu</creatorcontrib><creatorcontrib>Chen, Xieyuanli</creatorcontrib><creatorcontrib>Li, Dachuan</creatorcontrib><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><description>Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DM0MDMw4mRw9g3T9fUPtlLwLc0pydQNy0wtV3BLTSwpLUpVcCstzszPU0jLL1IwdlHwzS_LzEtX8E_KSk0uUQhOTc9NzStJLAGq4GFgTUvMKU7lhdLcDPJuriHOHrpg6-ILijJzE4sq40HWxoOtNSasAgDajjY7</recordid><startdate>20240820</startdate><enddate>20240820</enddate><creator>Cheng, Jintao</creator><creator>Chen, Xingming</creator><creator>Liang, Jinxin</creator><creator>Tang, Xiaoyu</creator><creator>Chen, Xieyuanli</creator><creator>Li, Dachuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240820</creationdate><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><author>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_106023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Jintao</creatorcontrib><creatorcontrib>Chen, Xingming</creatorcontrib><creatorcontrib>Liang, Jinxin</creatorcontrib><creatorcontrib>Tang, Xiaoyu</creatorcontrib><creatorcontrib>Chen, Xieyuanli</creatorcontrib><creatorcontrib>Li, Dachuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Jintao</au><au>Chen, Xingming</au><au>Liang, Jinxin</au><au>Tang, Xiaoyu</au><au>Chen, Xieyuanli</au><au>Li, Dachuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</atitle><date>2024-08-20</date><risdate>2024</risdate><abstract>Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.</abstract><doi>10.48550/arxiv.2408.10602</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2408.10602
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2408_10602
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
title	MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T20%3A03%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MV-MOS:%20Multi-View%20Feature%20Fusion%20for%203D%20Moving%20Object%20Segmentation&rft.au=Cheng,%20Jintao&rft.date=2024-08-20&rft_id=info:doi/10.48550/arxiv.2408.10602&rft_dat=%3Carxiv_GOX%3E2408_10602%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true