A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation

Progress has been achieved recently in object detection given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large n...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mitash, Chaitanya, Bekris, Kostas E, Boularias, Abdeslam
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Mitash, Chaitanya Bekris, Kostas E Boularias, Abdeslam
description	Progress has been achieved recently in object detection given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. This work proposes an autonomous process for training a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The focus is on detecting objects placed in cluttered, tight environments, such as a shelf with multiple objects. In particular, given access to 3D object models, several aspects of the environment are physically simulated. The models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset collected for this paper with a Motoman robotic arm. Results show that the proposed approach outperforms popular training processes relying on synthetic - but not physically realistic - data and manual annotation. The key contributions are the incorporation of physical reasoning in the synthetic data generation process and the automation of the annotation process over real images.
doi_str_mv	10.48550/arxiv.1703.03347
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1703_03347</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1703_03347</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-9b4c23eda7996886d75fa18e2cf97dd6a21153d03b1d32c3a4a52007699defb53</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwIr7Awl2HMfxsirlIQW1UrqPnPgajPKo7CTQv6dNWc3ijGZ0CHlgNE5zIeiT9r9ujpmkPKacp_KW9GsosbVRmI7oZxfQQIHa967_hPIURuzADh529Tc2IzzjeA439DCFS2P_dQquCVC6bmr1AnRv4GNqRxfNDn9gPwSEbRhdt-A7cmN1G_D-P1fk8LI9bN6iYvf6vlkXkc6kjFSdNglHo6VSWZ5nRgqrWY5JY5U0JtMJY4IbymtmeNJwnWqRUCozpQzaWvAVebzOLsLV0Z_v_am6iFeLOP8DN2BU1g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation</title><source>arXiv.org</source><creator>Mitash, Chaitanya ; Bekris, Kostas E ; Boularias, Abdeslam</creator><creatorcontrib>Mitash, Chaitanya ; Bekris, Kostas E ; Boularias, Abdeslam</creatorcontrib><description>Progress has been achieved recently in object detection given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. This work proposes an autonomous process for training a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The focus is on detecting objects placed in cluttered, tight environments, such as a shelf with multiple objects. In particular, given access to 3D object models, several aspects of the environment are physically simulated. The models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset collected for this paper with a Motoman robotic arm. Results show that the proposed approach outperforms popular training processes relying on synthetic - but not physically realistic - data and manual annotation. The key contributions are the incorporation of physical reasoning in the synthetic data generation process and the automation of the annotation process over real images.</description><identifier>DOI: 10.48550/arxiv.1703.03347</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Robotics</subject><creationdate>2017-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1703.03347$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1703.03347$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mitash, Chaitanya</creatorcontrib><creatorcontrib>Bekris, Kostas E</creatorcontrib><creatorcontrib>Boularias, Abdeslam</creatorcontrib><title>A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation</title><description>Progress has been achieved recently in object detection given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. This work proposes an autonomous process for training a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The focus is on detecting objects placed in cluttered, tight environments, such as a shelf with multiple objects. In particular, given access to 3D object models, several aspects of the environment are physically simulated. The models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset collected for this paper with a Motoman robotic arm. Results show that the proposed approach outperforms popular training processes relying on synthetic - but not physically realistic - data and manual annotation. The key contributions are the incorporation of physical reasoning in the synthetic data generation process and the automation of the annotation process over real images.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwIr7Awl2HMfxsirlIQW1UrqPnPgajPKo7CTQv6dNWc3ijGZ0CHlgNE5zIeiT9r9ujpmkPKacp_KW9GsosbVRmI7oZxfQQIHa967_hPIURuzADh529Tc2IzzjeA439DCFS2P_dQquCVC6bmr1AnRv4GNqRxfNDn9gPwSEbRhdt-A7cmN1G_D-P1fk8LI9bN6iYvf6vlkXkc6kjFSdNglHo6VSWZ5nRgqrWY5JY5U0JtMJY4IbymtmeNJwnWqRUCozpQzaWvAVebzOLsLV0Z_v_am6iFeLOP8DN2BU1g</recordid><startdate>20170309</startdate><enddate>20170309</enddate><creator>Mitash, Chaitanya</creator><creator>Bekris, Kostas E</creator><creator>Boularias, Abdeslam</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20170309</creationdate><title>A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation</title><author>Mitash, Chaitanya ; Bekris, Kostas E ; Boularias, Abdeslam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-9b4c23eda7996886d75fa18e2cf97dd6a21153d03b1d32c3a4a52007699defb53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Mitash, Chaitanya</creatorcontrib><creatorcontrib>Bekris, Kostas E</creatorcontrib><creatorcontrib>Boularias, Abdeslam</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mitash, Chaitanya</au><au>Bekris, Kostas E</au><au>Boularias, Abdeslam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation</atitle><date>2017-03-09</date><risdate>2017</risdate><abstract>Progress has been achieved recently in object detection given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. This work proposes an autonomous process for training a Convolutional Neural Network (CNN) for object detection and pose estimation in robotic setups. The focus is on detecting objects placed in cluttered, tight environments, such as a shelf with multiple objects. In particular, given access to 3D object models, several aspects of the environment are physically simulated. The models are placed in physically realistic poses with respect to their environment to generate a labeled synthetic dataset. To further improve object detection, the network self-trains over real images that are labeled using a robust multi-view pose estimation process. The proposed training process is evaluated on several existing datasets and on a dataset collected for this paper with a Motoman robotic arm. Results show that the proposed approach outperforms popular training processes relying on synthetic - but not physically realistic - data and manual annotation. The key contributions are the incorporation of physical reasoning in the synthetic data generation process and the automation of the annotation process over real images.</abstract><doi>10.48550/arxiv.1703.03347</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1703.03347
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1703_03347
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
title	A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T19%3A26%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Self-supervised%20Learning%20System%20for%20Object%20Detection%20using%20Physics%20Simulation%20and%20Multi-view%20Pose%20Estimation&rft.au=Mitash,%20Chaitanya&rft.date=2017-03-09&rft_id=info:doi/10.48550/arxiv.1703.03347&rft_dat=%3Carxiv_GOX%3E1703_03347%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true