Leveraging Hand-Object Interactions in Assistive Egocentric Vision

Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-06, Vol.45 (6), p.6820-6831
Hauptverfasser: Lee, Kyungjun, Shrivastava, Abhinav, Kacorri, Hernisa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6831
container_issue 6
container_start_page 6820
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 45
creator Lee, Kyungjun
Shrivastava, Abhinav
Kacorri, Hernisa
description Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.
doi_str_mv 10.1109/TPAMI.2021.3123303
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_34705636</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9591443</ieee_id><sourcerecordid>2809890514</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</originalsourceid><addsrcrecordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2809890514</pqid></control><display><type>article</type><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creator><creatorcontrib>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creatorcontrib><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2021.3123303</identifier><identifier>PMID: 34705636</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>accessibility ; Algorithms ; Annotations ; Blind people ; Blindness ; Cameras ; Classification ; Computational modeling ; Context modeling ; contextual priming ; Egocentric vision ; Hand ; hand-object interaction ; Humans ; Labels ; Localization ; Location awareness ; object localization ; Object recognition ; Priming ; Quality of Life ; Vision ; Visual Perception ; Visualization ; Visually Impaired Persons</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</cites><orcidid>0000-0001-8928-8554 ; 0000-0001-8556-9113 ; 0000-0002-7798-308X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34705636$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><subject>accessibility</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Blind people</subject><subject>Blindness</subject><subject>Cameras</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Context modeling</subject><subject>contextual priming</subject><subject>Egocentric vision</subject><subject>Hand</subject><subject>hand-object interaction</subject><subject>Humans</subject><subject>Labels</subject><subject>Localization</subject><subject>Location awareness</subject><subject>object localization</subject><subject>Object recognition</subject><subject>Priming</subject><subject>Quality of Life</subject><subject>Vision</subject><subject>Visual Perception</subject><subject>Visualization</subject><subject>Visually Impaired Persons</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Lee, Kyungjun</creator><creator>Shrivastava, Abhinav</creator><creator>Kacorri, Hernisa</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></search><sort><creationdate>20230601</creationdate><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><author>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>accessibility</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Blind people</topic><topic>Blindness</topic><topic>Cameras</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Context modeling</topic><topic>contextual priming</topic><topic>Egocentric vision</topic><topic>Hand</topic><topic>hand-object interaction</topic><topic>Humans</topic><topic>Labels</topic><topic>Localization</topic><topic>Location awareness</topic><topic>object localization</topic><topic>Object recognition</topic><topic>Priming</topic><topic>Quality of Life</topic><topic>Vision</topic><topic>Visual Perception</topic><topic>Visualization</topic><topic>Visually Impaired Persons</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Kyungjun</au><au>Shrivastava, Abhinav</au><au>Kacorri, Hernisa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-06-01</date><risdate>2023</risdate><volume>45</volume><issue>6</issue><spage>6820</spage><epage>6831</epage><pages>6820-6831</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34705636</pmid><doi>10.1109/TPAMI.2021.3123303</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_pubmed_primary_34705636
source IEEE Electronic Library (IEL)
subjects accessibility
Algorithms
Annotations
Blind people
Blindness
Cameras
Classification
Computational modeling
Context modeling
contextual priming
Egocentric vision
Hand
hand-object interaction
Humans
Labels
Localization
Location awareness
object localization
Object recognition
Priming
Quality of Life
Vision
Visual Perception
Visualization
Visually Impaired Persons
title Leveraging Hand-Object Interactions in Assistive Egocentric Vision
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A47%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Hand-Object%20Interactions%20in%20Assistive%20Egocentric%20Vision&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Lee,%20Kyungjun&rft.date=2023-06-01&rft.volume=45&rft.issue=6&rft.spage=6820&rft.epage=6831&rft.pages=6820-6831&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2021.3123303&rft_dat=%3Cproquest_RIE%3E2809890514%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2809890514&rft_id=info:pmid/34705636&rft_ieee_id=9591443&rfr_iscdi=true