DeePoint: Visual Pointing Recognition and Direction Estimation

In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction est...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-09
Hauptverfasser: Nakamura, Shu, Kawanishi, Yasutomo, Nobuhara, Shohei, Nishino, Ko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Nakamura, Shu
Kawanishi, Yasutomo
Nobuhara, Shohei
Nishino, Ko
description In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2802175014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2802175014</sourcerecordid><originalsourceid>FETCH-proquest_journals_28021750143</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwc0lNDcjPzCuxUgjLLC5NzFEA8zLz0hWCUpPz0_MySzLz8xQS81IUXDKLUpPBPNfikszcRBCTh4E1LTGnOJUXSnMzKLu5hjh76BYU5ReWphaXxGfllxblAaXijSwMjAzNTQ0MTYyJUwUAr9w4Pw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2802175014</pqid></control><display><type>article</type><title>DeePoint: Visual Pointing Recognition and Direction Estimation</title><source>Free E- Journals</source><creator>Nakamura, Shu ; Kawanishi, Yasutomo ; Nobuhara, Shohei ; Nishino, Ko</creator><creatorcontrib>Nakamura, Shu ; Kawanishi, Yasutomo ; Nobuhara, Shohei ; Nishino, Ko</creatorcontrib><description>In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Body parts ; Datasets ; Recognition</subject><ispartof>arXiv.org, 2023-09</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Nakamura, Shu</creatorcontrib><creatorcontrib>Kawanishi, Yasutomo</creatorcontrib><creatorcontrib>Nobuhara, Shohei</creatorcontrib><creatorcontrib>Nishino, Ko</creatorcontrib><title>DeePoint: Visual Pointing Recognition and Direction Estimation</title><title>arXiv.org</title><description>In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding.</description><subject>Body parts</subject><subject>Datasets</subject><subject>Recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwc0lNDcjPzCuxUgjLLC5NzFEA8zLz0hWCUpPz0_MySzLz8xQS81IUXDKLUpPBPNfikszcRBCTh4E1LTGnOJUXSnMzKLu5hjh76BYU5ReWphaXxGfllxblAaXijSwMjAzNTQ0MTYyJUwUAr9w4Pw</recordid><startdate>20230911</startdate><enddate>20230911</enddate><creator>Nakamura, Shu</creator><creator>Kawanishi, Yasutomo</creator><creator>Nobuhara, Shohei</creator><creator>Nishino, Ko</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230911</creationdate><title>DeePoint: Visual Pointing Recognition and Direction Estimation</title><author>Nakamura, Shu ; Kawanishi, Yasutomo ; Nobuhara, Shohei ; Nishino, Ko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28021750143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Body parts</topic><topic>Datasets</topic><topic>Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Nakamura, Shu</creatorcontrib><creatorcontrib>Kawanishi, Yasutomo</creatorcontrib><creatorcontrib>Nobuhara, Shohei</creatorcontrib><creatorcontrib>Nishino, Ko</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nakamura, Shu</au><au>Kawanishi, Yasutomo</au><au>Nobuhara, Shohei</au><au>Nishino, Ko</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>DeePoint: Visual Pointing Recognition and Direction Estimation</atitle><jtitle>arXiv.org</jtitle><date>2023-09-11</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_2802175014
source Free E- Journals
subjects Body parts
Datasets
Recognition
title DeePoint: Visual Pointing Recognition and Direction Estimation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T13%3A33%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=DeePoint:%20Visual%20Pointing%20Recognition%20and%20Direction%20Estimation&rft.jtitle=arXiv.org&rft.au=Nakamura,%20Shu&rft.date=2023-09-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2802175014%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2802175014&rft_id=info:pmid/&rfr_iscdi=true