Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network

In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2018-11, Vol.18 (22), p.9370-9381
Hauptverfasser: Lin, Chien-Ming, Tsai, Chi-Yi, Lai, Yu-Cheng, Li, Shin-An, Wong, Ching-Chang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 9381
container_issue 22
container_start_page 9370
container_title IEEE sensors journal
container_volume 18
creator Lin, Chien-Ming
Tsai, Chi-Yi
Lai, Yu-Cheng
Li, Shin-An
Wong, Ching-Chang
description In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object pose estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.
doi_str_mv 10.1109/JSEN.2018.2870957
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_JSEN_2018_2870957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8467328</ieee_id><sourcerecordid>2126463088</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-36a0c138b58602d80468aaa96f95beb1e06ee22b4043a8b420164947e60670693</originalsourceid><addsrcrecordid>eNo9kFtLw0AQhRdRsFZ_gPiy4HPq7CV7efRSb5RWrIrgw7JJpyW1TWp2i_jvTUzxaQ4zZ2Y4HyGnDAaMgb14nA7HAw7MDLjRYFO9R3osTU3CtDT7rRaQSKHfD8lRCEsAZnWqe-TjrQhbv6KTbIl5pM-YV4uyiEVVUl_O6FMVkA5DLNb-r3flA85oO6Q3iBs6xbUvY5E3YrHGMnauMcbvqv48Jgdzvwp4sqt98no7fLm-T0aTu4fry1GScytiIpSHnAmTpUYBnxmQynjvrZrbNMOMIShEzjMJUniTySalklZqVKA0KCv65Ly7u6mrry2G6JbVti6bl44zrqQSYEzjYp0rr6sQapy7Td3Eqn8cA9cydC1D1zJ0O4bNzlm3UyDiv99IpQU34hennmwU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2126463088</pqid></control><display><type>article</type><title>Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network</title><source>IEEE Electronic Library (IEL)</source><creator>Lin, Chien-Ming ; Tsai, Chi-Yi ; Lai, Yu-Cheng ; Li, Shin-An ; Wong, Ching-Chang</creator><creatorcontrib>Lin, Chien-Ming ; Tsai, Chi-Yi ; Lai, Yu-Cheng ; Li, Shin-An ; Wong, Ching-Chang</creatorcontrib><description>In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object pose estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.</description><identifier>ISSN: 1530-437X</identifier><identifier>EISSN: 1558-1748</identifier><identifier>DOI: 10.1109/JSEN.2018.2870957</identifier><identifier>CODEN: ISJEAZ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Control systems ; Convolution ; convolution neural networks ; Deep learning ; Image classification ; Image segmentation ; Machine learning ; Manipulators ; Neural networks ; Object recognition ; Picking ; Pose estimation ; Robot arms ; Robots ; Semantic segmentation ; Semantics ; State of the art ; Three-dimensional displays ; Training ; Visual control ; Visual perception ; Visual perception driven algorithms ; Visual tasks</subject><ispartof>IEEE sensors journal, 2018-11, Vol.18 (22), p.9370-9381</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-36a0c138b58602d80468aaa96f95beb1e06ee22b4043a8b420164947e60670693</citedby><cites>FETCH-LOGICAL-c293t-36a0c138b58602d80468aaa96f95beb1e06ee22b4043a8b420164947e60670693</cites><orcidid>0000-0001-9872-4338</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8467328$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8467328$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lin, Chien-Ming</creatorcontrib><creatorcontrib>Tsai, Chi-Yi</creatorcontrib><creatorcontrib>Lai, Yu-Cheng</creatorcontrib><creatorcontrib>Li, Shin-An</creatorcontrib><creatorcontrib>Wong, Ching-Chang</creatorcontrib><title>Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network</title><title>IEEE sensors journal</title><addtitle>JSEN</addtitle><description>In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object pose estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.</description><subject>Control systems</subject><subject>Convolution</subject><subject>convolution neural networks</subject><subject>Deep learning</subject><subject>Image classification</subject><subject>Image segmentation</subject><subject>Machine learning</subject><subject>Manipulators</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Picking</subject><subject>Pose estimation</subject><subject>Robot arms</subject><subject>Robots</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>State of the art</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>Visual control</subject><subject>Visual perception</subject><subject>Visual perception driven algorithms</subject><subject>Visual tasks</subject><issn>1530-437X</issn><issn>1558-1748</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kFtLw0AQhRdRsFZ_gPiy4HPq7CV7efRSb5RWrIrgw7JJpyW1TWp2i_jvTUzxaQ4zZ2Y4HyGnDAaMgb14nA7HAw7MDLjRYFO9R3osTU3CtDT7rRaQSKHfD8lRCEsAZnWqe-TjrQhbv6KTbIl5pM-YV4uyiEVVUl_O6FMVkA5DLNb-r3flA85oO6Q3iBs6xbUvY5E3YrHGMnauMcbvqv48Jgdzvwp4sqt98no7fLm-T0aTu4fry1GScytiIpSHnAmTpUYBnxmQynjvrZrbNMOMIShEzjMJUniTySalklZqVKA0KCv65Ly7u6mrry2G6JbVti6bl44zrqQSYEzjYp0rr6sQapy7Td3Eqn8cA9cydC1D1zJ0O4bNzlm3UyDiv99IpQU34hennmwU</recordid><startdate>20181115</startdate><enddate>20181115</enddate><creator>Lin, Chien-Ming</creator><creator>Tsai, Chi-Yi</creator><creator>Lai, Yu-Cheng</creator><creator>Li, Shin-An</creator><creator>Wong, Ching-Chang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-9872-4338</orcidid></search><sort><creationdate>20181115</creationdate><title>Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network</title><author>Lin, Chien-Ming ; Tsai, Chi-Yi ; Lai, Yu-Cheng ; Li, Shin-An ; Wong, Ching-Chang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-36a0c138b58602d80468aaa96f95beb1e06ee22b4043a8b420164947e60670693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Control systems</topic><topic>Convolution</topic><topic>convolution neural networks</topic><topic>Deep learning</topic><topic>Image classification</topic><topic>Image segmentation</topic><topic>Machine learning</topic><topic>Manipulators</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Picking</topic><topic>Pose estimation</topic><topic>Robot arms</topic><topic>Robots</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>State of the art</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>Visual control</topic><topic>Visual perception</topic><topic>Visual perception driven algorithms</topic><topic>Visual tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lin, Chien-Ming</creatorcontrib><creatorcontrib>Tsai, Chi-Yi</creatorcontrib><creatorcontrib>Lai, Yu-Cheng</creatorcontrib><creatorcontrib>Li, Shin-An</creatorcontrib><creatorcontrib>Wong, Ching-Chang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE sensors journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Chien-Ming</au><au>Tsai, Chi-Yi</au><au>Lai, Yu-Cheng</au><au>Li, Shin-An</au><au>Wong, Ching-Chang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network</atitle><jtitle>IEEE sensors journal</jtitle><stitle>JSEN</stitle><date>2018-11-15</date><risdate>2018</risdate><volume>18</volume><issue>22</issue><spage>9370</spage><epage>9381</epage><pages>9370-9381</pages><issn>1530-437X</issn><eissn>1558-1748</eissn><coden>ISJEAZ</coden><abstract>In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object pose estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSEN.2018.2870957</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9872-4338</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-437X
ispartof IEEE sensors journal, 2018-11, Vol.18 (22), p.9370-9381
issn 1530-437X
1558-1748
language eng
recordid cdi_crossref_primary_10_1109_JSEN_2018_2870957
source IEEE Electronic Library (IEL)
subjects Control systems
Convolution
convolution neural networks
Deep learning
Image classification
Image segmentation
Machine learning
Manipulators
Neural networks
Object recognition
Picking
Pose estimation
Robot arms
Robots
Semantic segmentation
Semantics
State of the art
Three-dimensional displays
Training
Visual control
Visual perception
Visual perception driven algorithms
Visual tasks
title Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T04%3A03%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visual%20Object%20Recognition%20and%20Pose%20Estimation%20Based%20on%20a%20Deep%20Semantic%20Segmentation%20Network&rft.jtitle=IEEE%20sensors%20journal&rft.au=Lin,%20Chien-Ming&rft.date=2018-11-15&rft.volume=18&rft.issue=22&rft.spage=9370&rft.epage=9381&rft.pages=9370-9381&rft.issn=1530-437X&rft.eissn=1558-1748&rft.coden=ISJEAZ&rft_id=info:doi/10.1109/JSEN.2018.2870957&rft_dat=%3Cproquest_RIE%3E2126463088%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2126463088&rft_id=info:pmid/&rft_ieee_id=8467328&rfr_iscdi=true