Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy
IEEE Robotics and Automation Letters (Volume: 9, Issue: 11, November 2024) For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability i...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Kim, Yunho Lee, Jeong Hyun Lee, Choongin Mun, Juhyeok Youm, Donghoon Park, Jeongsoo Hwangbo, Jemin |
description | IEEE Robotics and Automation Letters (Volume: 9, Issue: 11,
November 2024) For reliable autonomous robot navigation in urban settings, the robot must
have the ability to identify semantically traversable terrains in the image
based on the semantic understanding of the scene. This reasoning ability is
based on semantic traversability, which is frequently achieved using semantic
segmentation models fine-tuned on the testing domain. This fine-tuning process
often involves manual data collection with the target robot and annotation by
human labelers which is prohibitively expensive and unscalable. In this work,
we present an effective methodology for training a semantic traversability
estimator using egocentric videos and an automated annotation process.
Egocentric videos are collected from a camera mounted on a pedestrian's chest.
The dataset for training the semantic traversability estimator is then
automatically generated by extracting semantically traversable regions in each
video frame using a recent foundation model in image segmentation and its
prompting technique. Extensive experiments with videos taken across several
countries and cities, covering diverse urban scenarios, demonstrate the high
scalability and generalizability of the proposed annotation method.
Furthermore, performance analysis and real-world deployment for autonomous
robot navigation showcase that the trained semantic traversability estimator is
highly accurate, able to handle diverse camera viewpoints, computationally
light, and real-world applicable. The summary video is available at
https://youtu.be/EUVoH-wA-lA. |
doi_str_mv | 10.48550/arxiv.2406.02989 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_02989</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_02989</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_029893</originalsourceid><addsrcrecordid>eNqFjrEOgkAQRK-xMOoHWHk_ICKCgdIYjIUdxNiRFU7cBPbMsqL8vUjsrWYyb4qn1HztOn4YBO4K-I2t4_nu1nG9KIzG6nIywIRU6sTUQIK5Thlaww1csULp9AvlruPS5oaEe3zGwlgNVOjdU2wNYvpGZAUELelEuJ_KbqpGN6gaM_vlRC0Ocbo_LgeH7MFYA3fZ1yUbXDb_Hx-BC0BY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy</title><source>arXiv.org</source><creator>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</creator><creatorcontrib>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</creatorcontrib><description>IEEE Robotics and Automation Letters (Volume: 9, Issue: 11,
November 2024) For reliable autonomous robot navigation in urban settings, the robot must
have the ability to identify semantically traversable terrains in the image
based on the semantic understanding of the scene. This reasoning ability is
based on semantic traversability, which is frequently achieved using semantic
segmentation models fine-tuned on the testing domain. This fine-tuning process
often involves manual data collection with the target robot and annotation by
human labelers which is prohibitively expensive and unscalable. In this work,
we present an effective methodology for training a semantic traversability
estimator using egocentric videos and an automated annotation process.
Egocentric videos are collected from a camera mounted on a pedestrian's chest.
The dataset for training the semantic traversability estimator is then
automatically generated by extracting semantically traversable regions in each
video frame using a recent foundation model in image segmentation and its
prompting technique. Extensive experiments with videos taken across several
countries and cities, covering diverse urban scenarios, demonstrate the high
scalability and generalizability of the proposed annotation method.
Furthermore, performance analysis and real-world deployment for autonomous
robot navigation showcase that the trained semantic traversability estimator is
highly accurate, able to handle diverse camera viewpoints, computationally
light, and real-world applicable. The summary video is available at
https://youtu.be/EUVoH-wA-lA.</description><identifier>DOI: 10.48550/arxiv.2406.02989</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Robotics</subject><creationdate>2024-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.02989$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.02989$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1109/LRA.2024.3474548$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Kim, Yunho</creatorcontrib><creatorcontrib>Lee, Jeong Hyun</creatorcontrib><creatorcontrib>Lee, Choongin</creatorcontrib><creatorcontrib>Mun, Juhyeok</creatorcontrib><creatorcontrib>Youm, Donghoon</creatorcontrib><creatorcontrib>Park, Jeongsoo</creatorcontrib><creatorcontrib>Hwangbo, Jemin</creatorcontrib><title>Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy</title><description>IEEE Robotics and Automation Letters (Volume: 9, Issue: 11,
November 2024) For reliable autonomous robot navigation in urban settings, the robot must
have the ability to identify semantically traversable terrains in the image
based on the semantic understanding of the scene. This reasoning ability is
based on semantic traversability, which is frequently achieved using semantic
segmentation models fine-tuned on the testing domain. This fine-tuning process
often involves manual data collection with the target robot and annotation by
human labelers which is prohibitively expensive and unscalable. In this work,
we present an effective methodology for training a semantic traversability
estimator using egocentric videos and an automated annotation process.
Egocentric videos are collected from a camera mounted on a pedestrian's chest.
The dataset for training the semantic traversability estimator is then
automatically generated by extracting semantically traversable regions in each
video frame using a recent foundation model in image segmentation and its
prompting technique. Extensive experiments with videos taken across several
countries and cities, covering diverse urban scenarios, demonstrate the high
scalability and generalizability of the proposed annotation method.
Furthermore, performance analysis and real-world deployment for autonomous
robot navigation showcase that the trained semantic traversability estimator is
highly accurate, able to handle diverse camera viewpoints, computationally
light, and real-world applicable. The summary video is available at
https://youtu.be/EUVoH-wA-lA.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrEOgkAQRK-xMOoHWHk_ICKCgdIYjIUdxNiRFU7cBPbMsqL8vUjsrWYyb4qn1HztOn4YBO4K-I2t4_nu1nG9KIzG6nIywIRU6sTUQIK5Thlaww1csULp9AvlruPS5oaEe3zGwlgNVOjdU2wNYvpGZAUELelEuJ_KbqpGN6gaM_vlRC0Ocbo_LgeH7MFYA3fZ1yUbXDb_Hx-BC0BY</recordid><startdate>20240605</startdate><enddate>20240605</enddate><creator>Kim, Yunho</creator><creator>Lee, Jeong Hyun</creator><creator>Lee, Choongin</creator><creator>Mun, Juhyeok</creator><creator>Youm, Donghoon</creator><creator>Park, Jeongsoo</creator><creator>Hwangbo, Jemin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240605</creationdate><title>Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy</title><author>Kim, Yunho ; Lee, Jeong Hyun ; Lee, Choongin ; Mun, Juhyeok ; Youm, Donghoon ; Park, Jeongsoo ; Hwangbo, Jemin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_029893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Kim, Yunho</creatorcontrib><creatorcontrib>Lee, Jeong Hyun</creatorcontrib><creatorcontrib>Lee, Choongin</creatorcontrib><creatorcontrib>Mun, Juhyeok</creatorcontrib><creatorcontrib>Youm, Donghoon</creatorcontrib><creatorcontrib>Park, Jeongsoo</creatorcontrib><creatorcontrib>Hwangbo, Jemin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kim, Yunho</au><au>Lee, Jeong Hyun</au><au>Lee, Choongin</au><au>Mun, Juhyeok</au><au>Youm, Donghoon</au><au>Park, Jeongsoo</au><au>Hwangbo, Jemin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy</atitle><date>2024-06-05</date><risdate>2024</risdate><abstract>IEEE Robotics and Automation Letters (Volume: 9, Issue: 11,
November 2024) For reliable autonomous robot navigation in urban settings, the robot must
have the ability to identify semantically traversable terrains in the image
based on the semantic understanding of the scene. This reasoning ability is
based on semantic traversability, which is frequently achieved using semantic
segmentation models fine-tuned on the testing domain. This fine-tuning process
often involves manual data collection with the target robot and annotation by
human labelers which is prohibitively expensive and unscalable. In this work,
we present an effective methodology for training a semantic traversability
estimator using egocentric videos and an automated annotation process.
Egocentric videos are collected from a camera mounted on a pedestrian's chest.
The dataset for training the semantic traversability estimator is then
automatically generated by extracting semantically traversable regions in each
video frame using a recent foundation model in image segmentation and its
prompting technique. Extensive experiments with videos taken across several
countries and cities, covering diverse urban scenarios, demonstrate the high
scalability and generalizability of the proposed annotation method.
Furthermore, performance analysis and real-world deployment for autonomous
robot navigation showcase that the trained semantic traversability estimator is
highly accurate, able to handle diverse camera viewpoints, computationally
light, and real-world applicable. The summary video is available at
https://youtu.be/EUVoH-wA-lA.</abstract><doi>10.48550/arxiv.2406.02989</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.02989 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_02989 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Robotics |
title | Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A25%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Semantic%20Traversability%20with%20Egocentric%20Video%20and%20Automated%20Annotation%20Strategy&rft.au=Kim,%20Yunho&rft.date=2024-06-05&rft_id=info:doi/10.48550/arxiv.2406.02989&rft_dat=%3Carxiv_GOX%3E2406_02989%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |