Leveraging Hand-Object Interactions in Assistive Egocentric Vision
Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2023-06, Vol.45 (6), p.6820-6831 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6831 |
---|---|
container_issue | 6 |
container_start_page | 6820 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 45 |
creator | Lee, Kyungjun Shrivastava, Abhinav Kacorri, Hernisa |
description | Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels. |
doi_str_mv | 10.1109/TPAMI.2021.3123303 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_34705636</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9591443</ieee_id><sourcerecordid>2809890514</sourcerecordid><originalsourceid>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</originalsourceid><addsrcrecordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2809890514</pqid></control><display><type>article</type><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><source>IEEE Electronic Library (IEL)</source><creator>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creator><creatorcontrib>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</creatorcontrib><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2021.3123303</identifier><identifier>PMID: 34705636</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>accessibility ; Algorithms ; Annotations ; Blind people ; Blindness ; Cameras ; Classification ; Computational modeling ; Context modeling ; contextual priming ; Egocentric vision ; Hand ; hand-object interaction ; Humans ; Labels ; Localization ; Location awareness ; object localization ; Object recognition ; Priming ; Quality of Life ; Vision ; Visual Perception ; Visualization ; Visually Impaired Persons</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</cites><orcidid>0000-0001-8928-8554 ; 0000-0001-8556-9113 ; 0000-0002-7798-308X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9591443$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34705636$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</description><subject>accessibility</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Blind people</subject><subject>Blindness</subject><subject>Cameras</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Context modeling</subject><subject>contextual priming</subject><subject>Egocentric vision</subject><subject>Hand</subject><subject>hand-object interaction</subject><subject>Humans</subject><subject>Labels</subject><subject>Localization</subject><subject>Location awareness</subject><subject>object localization</subject><subject>Object recognition</subject><subject>Priming</subject><subject>Quality of Life</subject><subject>Vision</subject><subject>Visual Perception</subject><subject>Visualization</subject><subject>Visually Impaired Persons</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkMFOAjEQhhujEURfQBOziRcvi21n290ekaCQYPCAXpvSnSUlsIvbXRLf3iLIwdMkM98_-fMRcstonzGqnubvg7dJn1PO-sA4AIUz0mUKVAwC1DnpUiZ5nGU865Ar71eUskRQuCQdSFIqJMgueZ7iDmuzdOUyGpsyj2eLFdommpRNWNvGVaWPXBkNvHe-cTuMRsvKYtnUzkafzof7NbkozNrjzXH2yMfLaD4cx9PZ62Q4mMYWKG_iIsUCEHOWpQvFCy6ZlBalACu4MYVIQx-TFImy1pjcWAglhS1SvpC5MUJBjzwe_m7r6qtF3-iN8xbXa1Ni1XrNRZamAriSAX34h66qti5DO80zqjJFBUsCxQ-UrSvvayz0tnYbU39rRvXesP41rPeG9dFwCN0fX7eLDeanyJ_SANwdAIeIp7MSiiUJwA8RLH8F</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Lee, Kyungjun</creator><creator>Shrivastava, Abhinav</creator><creator>Kacorri, Hernisa</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></search><sort><creationdate>20230601</creationdate><title>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</title><author>Lee, Kyungjun ; Shrivastava, Abhinav ; Kacorri, Hernisa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c302t-f7ef3eed187b92f26166ce653c52aaf57636a4f49ccaadac33475cf72b6daa593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>accessibility</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Blind people</topic><topic>Blindness</topic><topic>Cameras</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Context modeling</topic><topic>contextual priming</topic><topic>Egocentric vision</topic><topic>Hand</topic><topic>hand-object interaction</topic><topic>Humans</topic><topic>Labels</topic><topic>Localization</topic><topic>Location awareness</topic><topic>object localization</topic><topic>Object recognition</topic><topic>Priming</topic><topic>Quality of Life</topic><topic>Vision</topic><topic>Visual Perception</topic><topic>Visualization</topic><topic>Visually Impaired Persons</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Kyungjun</creatorcontrib><creatorcontrib>Shrivastava, Abhinav</creatorcontrib><creatorcontrib>Kacorri, Hernisa</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Kyungjun</au><au>Shrivastava, Abhinav</au><au>Kacorri, Hernisa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Hand-Object Interactions in Assistive Egocentric Vision</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-06-01</date><risdate>2023</risdate><volume>45</volume><issue>6</issue><spage>6820</spage><epage>6831</epage><pages>6820-6831</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34705636</pmid><doi>10.1109/TPAMI.2021.3123303</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8928-8554</orcidid><orcidid>https://orcid.org/0000-0001-8556-9113</orcidid><orcidid>https://orcid.org/0000-0002-7798-308X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2023-06, Vol.45 (6), p.6820-6831 |
issn | 0162-8828 1939-3539 2160-9292 |
language | eng |
recordid | cdi_pubmed_primary_34705636 |
source | IEEE Electronic Library (IEL) |
subjects | accessibility Algorithms Annotations Blind people Blindness Cameras Classification Computational modeling Context modeling contextual priming Egocentric vision Hand hand-object interaction Humans Labels Localization Location awareness object localization Object recognition Priming Quality of Life Vision Visual Perception Visualization Visually Impaired Persons |
title | Leveraging Hand-Object Interactions in Assistive Egocentric Vision |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A47%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Hand-Object%20Interactions%20in%20Assistive%20Egocentric%20Vision&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Lee,%20Kyungjun&rft.date=2023-06-01&rft.volume=45&rft.issue=6&rft.spage=6820&rft.epage=6831&rft.pages=6820-6831&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2021.3123303&rft_dat=%3Cproquest_RIE%3E2809890514%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2809890514&rft_id=info:pmid/34705636&rft_ieee_id=9591443&rfr_iscdi=true |