MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D proje...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Cheng, Jintao Chen, Xingming Liang, Jinxin Tang, Xiaoyu Chen, Xieyuanli Li, Dachuan |
description | Effectively summarizing dense 3D point cloud data and extracting motion
information of moving objects (moving object segmentation, MOS) is crucial to
autonomous driving and robotics applications. How to effectively utilize motion
and semantic features and avoid information loss during 3D-to-2D projection is
still a key challenge. In this paper, we propose a novel multi-view MOS model
(MV-MOS) by fusing motion-semantic features from different 2D representations
of point clouds. To effectively exploit complementary information, the motion
branches of the proposed model combines motion features from both bird's eye
view (BEV) and range view (RV) representations. In addition, a semantic branch
is introduced to provide supplementary semantic features of moving objects.
Finally, a Mamba module is utilized to fuse the semantic features with motion
features and provide effective guidance for the motion branches. We validated
the effectiveness of the proposed multi-branch fusion MOS framework via
comprehensive experiments, and our proposed model outperforms existing
state-of-the-art models on the SemanticKITTI benchmark. |
doi_str_mv | 10.48550/arxiv.2408.10602 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_10602</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_10602</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_106023</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DM0MDMw4mRw9g3T9fUPtlLwLc0pydQNy0wtV3BLTSwpLUpVcCstzszPU0jLL1IwdlHwzS_LzEtX8E_KSk0uUQhOTc9NzStJLAGq4GFgTUvMKU7lhdLcDPJuriHOHrpg6-ILijJzE4sq40HWxoOtNSasAgDajjY7</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><source>arXiv.org</source><creator>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</creator><creatorcontrib>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</creatorcontrib><description>Effectively summarizing dense 3D point cloud data and extracting motion
information of moving objects (moving object segmentation, MOS) is crucial to
autonomous driving and robotics applications. How to effectively utilize motion
and semantic features and avoid information loss during 3D-to-2D projection is
still a key challenge. In this paper, we propose a novel multi-view MOS model
(MV-MOS) by fusing motion-semantic features from different 2D representations
of point clouds. To effectively exploit complementary information, the motion
branches of the proposed model combines motion features from both bird's eye
view (BEV) and range view (RV) representations. In addition, a semantic branch
is introduced to provide supplementary semantic features of moving objects.
Finally, a Mamba module is utilized to fuse the semantic features with motion
features and provide effective guidance for the motion branches. We validated
the effectiveness of the proposed multi-branch fusion MOS framework via
comprehensive experiments, and our proposed model outperforms existing
state-of-the-art models on the SemanticKITTI benchmark.</description><identifier>DOI: 10.48550/arxiv.2408.10602</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.10602$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.10602$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cheng, Jintao</creatorcontrib><creatorcontrib>Chen, Xingming</creatorcontrib><creatorcontrib>Liang, Jinxin</creatorcontrib><creatorcontrib>Tang, Xiaoyu</creatorcontrib><creatorcontrib>Chen, Xieyuanli</creatorcontrib><creatorcontrib>Li, Dachuan</creatorcontrib><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><description>Effectively summarizing dense 3D point cloud data and extracting motion
information of moving objects (moving object segmentation, MOS) is crucial to
autonomous driving and robotics applications. How to effectively utilize motion
and semantic features and avoid information loss during 3D-to-2D projection is
still a key challenge. In this paper, we propose a novel multi-view MOS model
(MV-MOS) by fusing motion-semantic features from different 2D representations
of point clouds. To effectively exploit complementary information, the motion
branches of the proposed model combines motion features from both bird's eye
view (BEV) and range view (RV) representations. In addition, a semantic branch
is introduced to provide supplementary semantic features of moving objects.
Finally, a Mamba module is utilized to fuse the semantic features with motion
features and provide effective guidance for the motion branches. We validated
the effectiveness of the proposed multi-branch fusion MOS framework via
comprehensive experiments, and our proposed model outperforms existing
state-of-the-art models on the SemanticKITTI benchmark.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DM0MDMw4mRw9g3T9fUPtlLwLc0pydQNy0wtV3BLTSwpLUpVcCstzszPU0jLL1IwdlHwzS_LzEtX8E_KSk0uUQhOTc9NzStJLAGq4GFgTUvMKU7lhdLcDPJuriHOHrpg6-ILijJzE4sq40HWxoOtNSasAgDajjY7</recordid><startdate>20240820</startdate><enddate>20240820</enddate><creator>Cheng, Jintao</creator><creator>Chen, Xingming</creator><creator>Liang, Jinxin</creator><creator>Tang, Xiaoyu</creator><creator>Chen, Xieyuanli</creator><creator>Li, Dachuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240820</creationdate><title>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</title><author>Cheng, Jintao ; Chen, Xingming ; Liang, Jinxin ; Tang, Xiaoyu ; Chen, Xieyuanli ; Li, Dachuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_106023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Jintao</creatorcontrib><creatorcontrib>Chen, Xingming</creatorcontrib><creatorcontrib>Liang, Jinxin</creatorcontrib><creatorcontrib>Tang, Xiaoyu</creatorcontrib><creatorcontrib>Chen, Xieyuanli</creatorcontrib><creatorcontrib>Li, Dachuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Jintao</au><au>Chen, Xingming</au><au>Liang, Jinxin</au><au>Tang, Xiaoyu</au><au>Chen, Xieyuanli</au><au>Li, Dachuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation</atitle><date>2024-08-20</date><risdate>2024</risdate><abstract>Effectively summarizing dense 3D point cloud data and extracting motion
information of moving objects (moving object segmentation, MOS) is crucial to
autonomous driving and robotics applications. How to effectively utilize motion
and semantic features and avoid information loss during 3D-to-2D projection is
still a key challenge. In this paper, we propose a novel multi-view MOS model
(MV-MOS) by fusing motion-semantic features from different 2D representations
of point clouds. To effectively exploit complementary information, the motion
branches of the proposed model combines motion features from both bird's eye
view (BEV) and range view (RV) representations. In addition, a semantic branch
is introduced to provide supplementary semantic features of moving objects.
Finally, a Mamba module is utilized to fuse the semantic features with motion
features and provide effective guidance for the motion branches. We validated
the effectiveness of the proposed multi-branch fusion MOS framework via
comprehensive experiments, and our proposed model outperforms existing
state-of-the-art models on the SemanticKITTI benchmark.</abstract><doi>10.48550/arxiv.2408.10602</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2408.10602 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2408_10602 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition |
title | MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T20%3A03%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MV-MOS:%20Multi-View%20Feature%20Fusion%20for%203D%20Moving%20Object%20Segmentation&rft.au=Cheng,%20Jintao&rft.date=2024-08-20&rft_id=info:doi/10.48550/arxiv.2408.10602&rft_dat=%3Carxiv_GOX%3E2408_10602%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |