Leveraging Hand-Object Interactions in Assistive Egocentric Vision

Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2023-06, Vol.45 (6), p.6820-6831
Hauptverfasser:	Lee, Kyungjun, Shrivastava, Abhinav, Kacorri, Hernisa
Format:	Artikel
Sprache:	eng
Schlagworte:	accessibility Algorithms Annotations Blind people Blindness Cameras Classification Computational modeling Context modeling contextual priming Egocentric vision Hand hand-object interaction Humans Labels Localization Location awareness object localization Object recognition Priming Quality of Life Vision Visual Perception Visualization Visually Impaired Persons
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	6831
container_issue	6
container_start_page	6820
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	45
creator	Lee, Kyungjun Shrivastava, Abhinav Kacorri, Hernisa
description	Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.
doi_str_mv	10.1109/TPAMI.2021.3123303
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_34705636</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9591443</ieee_id><sourcerecordid>2809890514</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</originalsourceid><addsrcrecordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2809890514</pqid></control><display><type>article</type><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creator><creatorcontrib>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creatorcontrib><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2021.3123303</identifier><identifier>PMID: 34705636</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>accessibility ; Algorithms ; Annotations ; Blind people ; Blindness ; Cameras ; Classification ; Computational modeling ; Context modeling ; contextual priming ; Egocentric vision ; Hand ; hand-object interaction ; Humans ; Labels ; Localization ; Location awareness ; object localization ; Object recognition ; Priming ; Quality of Life ; Vision ; Visual Perception ; Visualization ; Visually Impaired Persons</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</cites><orcidid>0000-0001-8928-8554 ; 0000-0001-8556-9113 ; 0000-0002-7798-308X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34705636$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><subject>accessibility</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Blind people</subject><subject>Blindness</subject><subject>Cameras</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Context modeling</subject><subject>contextual priming</subject><subject>Egocentric vision</subject><subject>Hand</subject><subject>hand-object interaction</subject><subject>Humans</subject><subject>Labels</subject><subject>Localization</subject><subject>Location awareness</subject><subject>object localization</subject><subject>Object recognition</subject><subject>Priming</subject><subject>Quality of Life</subject><subject>Vision</subject><subject>Visual Perception</subject><subject>Visualization</subject><subject>Visually Impaired Persons</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Lee, Kyungjun</creator><creator>Shrivastava, Abhinav</creator><creator>Kacorri, Hernisa</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></search><sort><creationdate>20230601</creationdate><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><author>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>accessibility</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Blind people</topic><topic>Blindness</topic><topic>Cameras</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Context modeling</topic><topic>contextual priming</topic><topic>Egocentric vision</topic><topic>Hand</topic><topic>hand-object interaction</topic><topic>Humans</topic><topic>Labels</topic><topic>Localization</topic><topic>Location awareness</topic><topic>object localization</topic><topic>Object recognition</topic><topic>Priming</topic><topic>Quality of Life</topic><topic>Vision</topic><topic>Visual Perception</topic><topic>Visualization</topic><topic>Visually Impaired Persons</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Kyungjun</au><au>Shrivastava, Abhinav</au><au>Kacorri, Hernisa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-06-01</date><risdate>2023</risdate><volume>45</volume><issue>6</issue><spage>6820</spage><epage>6831</epage><pages>6820-6831</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34705636</pmid><doi>10.1109/TPAMI.2021.3123303</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_pubmed_primary_34705636
source	IEEE Electronic Library (IEL)
subjects	accessibility Algorithms Annotations Blind people Blindness Cameras Classification Computational modeling Context modeling contextual priming Egocentric vision Hand hand-object interaction Humans Labels Localization Location awareness object localization Object recognition Priming Quality of Life Vision Visual Perception Visualization Visually Impaired Persons
title	Leveraging Hand-Object Interactions in Assistive Egocentric Vision
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A47%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Hand-Object%20Interactions%20in%20Assistive%20Egocentric%20Vision&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Lee,%20Kyungjun&rft.date=2023-06-01&rft.volume=45&rft.issue=6&rft.spage=6820&rft.epage=6831&rft.pages=6820-6831&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2021.3123303&rft_dat=%3Cproquest_RIE%3E2809890514%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2809890514&rft_id=info:pmid/34705636&rft_ieee_id=9591443&rfr_iscdi=true