Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy

For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using sema...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-11, Vol.9 (11), p.10423-10430
Hauptverfasser:	Kim, Yunho, Lee, Jeong Hyun, Lee, Choongin, Mun, Juhyeok, Youm, Donghoon, Park, Jeongsoo, Hwangbo, Jemin
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Cameras Data collection Deep learning for visual perception Navigation Robot vision systems semantic scene understanding Semantic segmentation Semantics Training vision-based navigation Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	10430
container_issue	11
container_start_page	10423
container_title	IEEE robotics and automation letters
container_volume	9
creator	Kim, Yunho Lee, Jeong Hyun Lee, Choongin Mun, Juhyeok Youm, Donghoon Park, Jeongsoo Hwangbo, Jemin
description	For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable.
doi_str_mv	10.1109/LRA.2024.3474548
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LRA_2024_3474548</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10705064</ieee_id><sourcerecordid>10_1109_LRA_2024_3474548</sourcerecordid><originalsourceid>FETCH-LOGICAL-c189t-94941a86170f38dc6edd4f9f1225b940644d178fd04a86871e498742fa4abae63</originalsourceid><addsrcrecordid>eNpNkEtrwzAQhEVpoSHNvYce_Aec6rG2pGMI6QMMhSZ93IxirVKVRC6yWsi_r0JyyGmH3Zkd-Ai5ZXTKGNX3zetsyimHqQAJFagLMuJCylLIur4809dkMgzflFJWcSl0NSKfDZoYfNgUS9yZkHxXrKL5wziYtd_6tC8-fPoqFpu-w5BiPr97i31hgi1mv6nfmYRZhdAnk3wfimWKebXZ35ArZ7YDTk5zTN4eFqv5U9m8PD7PZ03ZMaVTqUEDM6pmkjqhbFejteC0Y5xXaw20BrBMKmcpZJeSDEErCdwZMGuDtRgTevzbxX4YIrr2J_qdifuW0fYAp81w2gOc9gQnR-6OEY-IZ3ZJq9wn_gFA8GDo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy</title><source>IEEE Electronic Library (IEL)</source><creator>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</creator><creatorcontrib>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</creatorcontrib><description>For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3474548</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>IEEE</publisher><subject>Annotations ; Cameras ; Data collection ; Deep learning for visual perception ; Navigation ; Robot vision systems ; semantic scene understanding ; Semantic segmentation ; Semantics ; Training ; vision-based navigation ; Visualization</subject><ispartof>IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10423-10430</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c189t-94941a86170f38dc6edd4f9f1225b940644d178fd04a86871e498742fa4abae63</cites><orcidid>0009-0003-8735-9373 ; 0000-0002-6558-3660 ; 0000-0002-3444-8079 ; 0009-0001-4985-8118 ; 0000-0001-6598-2875</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10705064$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10705064$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kim, Yunho</creatorcontrib><creatorcontrib>Lee, Jeong Hyun</creatorcontrib><creatorcontrib>Lee, Choongin</creatorcontrib><creatorcontrib>Mun, Juhyeok</creatorcontrib><creatorcontrib>Youm, Donghoon</creatorcontrib><creatorcontrib>Park, Jeongsoo</creatorcontrib><creatorcontrib>Hwangbo, Jemin</creatorcontrib><title>Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable.</description><subject>Annotations</subject><subject>Cameras</subject><subject>Data collection</subject><subject>Deep learning for visual perception</subject><subject>Navigation</subject><subject>Robot vision systems</subject><subject>semantic scene understanding</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Training</subject><subject>vision-based navigation</subject><subject>Visualization</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtrwzAQhEVpoSHNvYce_Aec6rG2pGMI6QMMhSZ93IxirVKVRC6yWsi_r0JyyGmH3Zkd-Ai5ZXTKGNX3zetsyimHqQAJFagLMuJCylLIur4809dkMgzflFJWcSl0NSKfDZoYfNgUS9yZkHxXrKL5wziYtd_6tC8-fPoqFpu-w5BiPr97i31hgi1mv6nfmYRZhdAnk3wfimWKebXZ35ArZ7YDTk5zTN4eFqv5U9m8PD7PZ03ZMaVTqUEDM6pmkjqhbFejteC0Y5xXaw20BrBMKmcpZJeSDEErCdwZMGuDtRgTevzbxX4YIrr2J_qdifuW0fYAp81w2gOc9gQnR-6OEY-IZ3ZJq9wn_gFA8GDo</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>Kim, Yunho</creator><creator>Lee, Jeong Hyun</creator><creator>Lee, Choongin</creator><creator>Mun, Juhyeok</creator><creator>Youm, Donghoon</creator><creator>Park, Jeongsoo</creator><creator>Hwangbo, Jemin</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0003-8735-9373</orcidid><orcidid>https://orcid.org/0000-0002-6558-3660</orcidid><orcidid>https://orcid.org/0000-0002-3444-8079</orcidid><orcidid>https://orcid.org/0009-0001-4985-8118</orcidid><orcidid>https://orcid.org/0000-0001-6598-2875</orcidid></search><sort><creationdate>202411</creationdate><title>Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy</title><author>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c189t-94941a86170f38dc6edd4f9f1225b940644d178fd04a86871e498742fa4abae63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Cameras</topic><topic>Data collection</topic><topic>Deep learning for visual perception</topic><topic>Navigation</topic><topic>Robot vision systems</topic><topic>semantic scene understanding</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Training</topic><topic>vision-based navigation</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Yunho</creatorcontrib><creatorcontrib>Lee, Jeong Hyun</creatorcontrib><creatorcontrib>Lee, Choongin</creatorcontrib><creatorcontrib>Mun, Juhyeok</creatorcontrib><creatorcontrib>Youm, Donghoon</creatorcontrib><creatorcontrib>Park, Jeongsoo</creatorcontrib><creatorcontrib>Hwangbo, Jemin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kim, Yunho</au><au>Lee, Jeong Hyun</au><au>Lee, Choongin</au><au>Mun, Juhyeok</au><au>Youm, Donghoon</au><au>Park, Jeongsoo</au><au>Hwangbo, Jemin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-11</date><risdate>2024</risdate><volume>9</volume><issue>11</issue><spage>10423</spage><epage>10430</epage><pages>10423-10430</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable.</abstract><pub>IEEE</pub><doi>10.1109/LRA.2024.3474548</doi><tpages>8</tpages><orcidid>https://orcid.org/0009-0003-8735-9373</orcidid><orcidid>https://orcid.org/0000-0002-6558-3660</orcidid><orcidid>https://orcid.org/0000-0002-3444-8079</orcidid><orcidid>https://orcid.org/0009-0001-4985-8118</orcidid><orcidid>https://orcid.org/0000-0001-6598-2875</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2377-3766
ispartof	IEEE robotics and automation letters, 2024-11, Vol.9 (11), p.10423-10430
issn	2377-3766 2377-3766
language	eng
recordid	cdi_crossref_primary_10_1109_LRA_2024_3474548
source	IEEE Electronic Library (IEL)
subjects	Annotations Cameras Data collection Deep learning for visual perception Navigation Robot vision systems semantic scene understanding Semantic segmentation Semantics Training vision-based navigation Visualization
title	Learning Semantic Traversability With Egocentric Video and Automated Annotation Strategy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A51%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Semantic%20Traversability%20With%20Egocentric%20Video%20and%20Automated%20Annotation%20Strategy&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Kim,%20Yunho&rft.date=2024-11&rft.volume=9&rft.issue=11&rft.spage=10423&rft.epage=10430&rft.pages=10423-10430&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3474548&rft_dat=%3Ccrossref_RIE%3E10_1109_LRA_2024_3474548%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10705064&rfr_iscdi=true