Recognizing object manipulation activities using depth and visual cues

•An algorithm that parses 3D-cloud representing the human body into torso and arms.•A technique to locate the held object, and incorporate size into a recognizer.•A temporal smoothing scheme to improve object and activity recognition. We propose a framework, consisting of several algorithms to recog...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of visual communication and image representation 2014-05, Vol.25 (4), p.719-726
Hauptverfasser:	Liu, Haowei, Philipose, Matthai, Pettersson, Martin, Sun, Ming-Ting
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Activity recognition Algorithms Boost Cues Depth camera HMM Inclusions Joint object and action recognition Object recognition Recognition Segmentation Tasks Temporal action recognition Temporal smoothing Visual
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	726
container_issue	4
container_start_page	719
container_title	Journal of visual communication and image representation
container_volume	25
creator	Liu, Haowei Philipose, Matthai Pettersson, Martin Sun, Ming-Ting
description	•An algorithm that parses 3D-cloud representing the human body into torso and arms.•A technique to locate the held object, and incorporate size into a recognizer.•A temporal smoothing scheme to improve object and activity recognition. We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.
doi_str_mv	10.1016/j.jvcir.2013.03.015
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1531030975</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1047320313000552</els_id><sourcerecordid>1531030975</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-d63e40f1928e4338f158d75fe8925844e5fca7ac9e096bb897b10fccee3e49a73</originalsourceid><addsrcrecordid>eNp9kMFKxDAQhoMouK4-gZcevbROmqZtDh5kcVVYEETPIU2n65RuW5O2oE9v63oWBmYO_zfDfIxdc4g48PS2jurJkoti4CKCubg8YSsOSoYKsvR0mZMsFDGIc3bhfQ0AQolkxbavaLt9S9_U7oOuqNEOwcG01I-NGahrA2MHmmgg9MHol1CJ_fARmLYMJvKjaQI7or9kZ5VpPF799TV73z68bZ7C3cvj8-Z-F1oh0iEsU4EJVFzFOSZC5BWXeZnJCnMVyzxJUFbWZMYqBJUWRa6ygkNlLeLMKZOJNbs57u1d9znfHfSBvMWmMS12o9dcCg4CVCbnqDhGreu8d1jp3tHBuC_NQS_WdK1_renFmoa5-ELdHSmcv5gInfaWsLVYkpvd6LKjf_kfhll3rQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1531030975</pqid></control><display><type>article</type><title>Recognizing object manipulation activities using depth and visual cues</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Liu, Haowei ; Philipose, Matthai ; Pettersson, Martin ; Sun, Ming-Ting</creator><creatorcontrib>Liu, Haowei ; Philipose, Matthai ; Pettersson, Martin ; Sun, Ming-Ting</creatorcontrib><description>•An algorithm that parses 3D-cloud representing the human body into torso and arms.•A technique to locate the held object, and incorporate size into a recognizer.•A temporal smoothing scheme to improve object and activity recognition. We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.</description><identifier>ISSN: 1047-3203</identifier><identifier>EISSN: 1095-9076</identifier><identifier>DOI: 10.1016/j.jvcir.2013.03.015</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Action recognition ; Activity recognition ; Algorithms ; Boost ; Cues ; Depth camera ; HMM ; Inclusions ; Joint object and action recognition ; Object recognition ; Recognition ; Segmentation ; Tasks ; Temporal action recognition ; Temporal smoothing ; Visual</subject><ispartof>Journal of visual communication and image representation, 2014-05, Vol.25 (4), p.719-726</ispartof><rights>2013 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-d63e40f1928e4338f158d75fe8925844e5fca7ac9e096bb897b10fccee3e49a73</citedby><cites>FETCH-LOGICAL-c336t-d63e40f1928e4338f158d75fe8925844e5fca7ac9e096bb897b10fccee3e49a73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jvcir.2013.03.015$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Liu, Haowei</creatorcontrib><creatorcontrib>Philipose, Matthai</creatorcontrib><creatorcontrib>Pettersson, Martin</creatorcontrib><creatorcontrib>Sun, Ming-Ting</creatorcontrib><title>Recognizing object manipulation activities using depth and visual cues</title><title>Journal of visual communication and image representation</title><description>•An algorithm that parses 3D-cloud representing the human body into torso and arms.•A technique to locate the held object, and incorporate size into a recognizer.•A temporal smoothing scheme to improve object and activity recognition. We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.</description><subject>Action recognition</subject><subject>Activity recognition</subject><subject>Algorithms</subject><subject>Boost</subject><subject>Cues</subject><subject>Depth camera</subject><subject>HMM</subject><subject>Inclusions</subject><subject>Joint object and action recognition</subject><subject>Object recognition</subject><subject>Recognition</subject><subject>Segmentation</subject><subject>Tasks</subject><subject>Temporal action recognition</subject><subject>Temporal smoothing</subject><subject>Visual</subject><issn>1047-3203</issn><issn>1095-9076</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKxDAQhoMouK4-gZcevbROmqZtDh5kcVVYEETPIU2n65RuW5O2oE9v63oWBmYO_zfDfIxdc4g48PS2jurJkoti4CKCubg8YSsOSoYKsvR0mZMsFDGIc3bhfQ0AQolkxbavaLt9S9_U7oOuqNEOwcG01I-NGahrA2MHmmgg9MHol1CJ_fARmLYMJvKjaQI7or9kZ5VpPF799TV73z68bZ7C3cvj8-Z-F1oh0iEsU4EJVFzFOSZC5BWXeZnJCnMVyzxJUFbWZMYqBJUWRa6ygkNlLeLMKZOJNbs57u1d9znfHfSBvMWmMS12o9dcCg4CVCbnqDhGreu8d1jp3tHBuC_NQS_WdK1_renFmoa5-ELdHSmcv5gInfaWsLVYkpvd6LKjf_kfhll3rQ</recordid><startdate>201405</startdate><enddate>201405</enddate><creator>Liu, Haowei</creator><creator>Philipose, Matthai</creator><creator>Pettersson, Martin</creator><creator>Sun, Ming-Ting</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201405</creationdate><title>Recognizing object manipulation activities using depth and visual cues</title><author>Liu, Haowei ; Philipose, Matthai ; Pettersson, Martin ; Sun, Ming-Ting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-d63e40f1928e4338f158d75fe8925844e5fca7ac9e096bb897b10fccee3e49a73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Action recognition</topic><topic>Activity recognition</topic><topic>Algorithms</topic><topic>Boost</topic><topic>Cues</topic><topic>Depth camera</topic><topic>HMM</topic><topic>Inclusions</topic><topic>Joint object and action recognition</topic><topic>Object recognition</topic><topic>Recognition</topic><topic>Segmentation</topic><topic>Tasks</topic><topic>Temporal action recognition</topic><topic>Temporal smoothing</topic><topic>Visual</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Haowei</creatorcontrib><creatorcontrib>Philipose, Matthai</creatorcontrib><creatorcontrib>Pettersson, Martin</creatorcontrib><creatorcontrib>Sun, Ming-Ting</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of visual communication and image representation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Haowei</au><au>Philipose, Matthai</au><au>Pettersson, Martin</au><au>Sun, Ming-Ting</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recognizing object manipulation activities using depth and visual cues</atitle><jtitle>Journal of visual communication and image representation</jtitle><date>2014-05</date><risdate>2014</risdate><volume>25</volume><issue>4</issue><spage>719</spage><epage>726</epage><pages>719-726</pages><issn>1047-3203</issn><eissn>1095-9076</eissn><abstract>•An algorithm that parses 3D-cloud representing the human body into torso and arms.•A technique to locate the held object, and incorporate size into a recognizer.•A temporal smoothing scheme to improve object and activity recognition. We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.jvcir.2013.03.015</doi><tpages>8</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1047-3203
ispartof	Journal of visual communication and image representation, 2014-05, Vol.25 (4), p.719-726
issn	1047-3203 1095-9076
language	eng
recordid	cdi_proquest_miscellaneous_1531030975
source	Elsevier ScienceDirect Journals Complete
subjects	Action recognition Activity recognition Algorithms Boost Cues Depth camera HMM Inclusions Joint object and action recognition Object recognition Recognition Segmentation Tasks Temporal action recognition Temporal smoothing Visual
title	Recognizing object manipulation activities using depth and visual cues
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T08%3A04%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recognizing%20object%20manipulation%20activities%20using%20depth%20and%20visual%20cues&rft.jtitle=Journal%20of%20visual%20communication%20and%20image%20representation&rft.au=Liu,%20Haowei&rft.date=2014-05&rft.volume=25&rft.issue=4&rft.spage=719&rft.epage=726&rft.pages=719-726&rft.issn=1047-3203&rft.eissn=1095-9076&rft_id=info:doi/10.1016/j.jvcir.2013.03.015&rft_dat=%3Cproquest_cross%3E1531030975%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1531030975&rft_id=info:pmid/&rft_els_id=S1047320313000552&rfr_iscdi=true