Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis
Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information forensics and security 2024, Vol.19, p.8630-8643 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 8643 |
---|---|
container_issue | |
container_start_page | 8630 |
container_title | IEEE transactions on information forensics and security |
container_volume | 19 |
creator | Zhang, Yufeng Kang, Wenxiong Song, Wenwei |
description | Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net . |
doi_str_mv | 10.1109/TIFS.2024.3451367 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2024_3451367</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10654331</ieee_id><sourcerecordid>10_1109_TIFS_2024_3451367</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</originalsourceid><addsrcrecordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creator><creatorcontrib>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creatorcontrib><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2024.3451367</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>IEEE</publisher><subject>Authentication ; behavioral characteristic representation ; Biometrics ; Feature extraction ; hand gesture authentication ; Lighting ; multimodal fusion ; Physiology ; Robustness ; spatiotemporal analysis ; Spatiotemporal phenomena ; Videos</subject><ispartof>IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</cites><orcidid>0000-0001-9023-7252 ; 0000-0002-7787-6604 ; 0000-0001-6639-555X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><subject>Authentication</subject><subject>behavioral characteristic representation</subject><subject>Biometrics</subject><subject>Feature extraction</subject><subject>hand gesture authentication</subject><subject>Lighting</subject><subject>multimodal fusion</subject><subject>Physiology</subject><subject>Robustness</subject><subject>spatiotemporal analysis</subject><subject>Spatiotemporal phenomena</subject><subject>Videos</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhang, Yufeng</creator><creator>Kang, Wenxiong</creator><creator>Song, Wenwei</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></search><sort><creationdate>2024</creationdate><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><author>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Authentication</topic><topic>behavioral characteristic representation</topic><topic>Biometrics</topic><topic>Feature extraction</topic><topic>hand gesture authentication</topic><topic>Lighting</topic><topic>multimodal fusion</topic><topic>Physiology</topic><topic>Robustness</topic><topic>spatiotemporal analysis</topic><topic>Spatiotemporal phenomena</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yufeng</au><au>Kang, Wenxiong</au><au>Song, Wenwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024</date><risdate>2024</risdate><volume>19</volume><spage>8630</spage><epage>8643</epage><pages>8630-8643</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</abstract><pub>IEEE</pub><doi>10.1109/TIFS.2024.3451367</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1556-6013 |
ispartof | IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643 |
issn | 1556-6013 1556-6021 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TIFS_2024_3451367 |
source | IEEE Electronic Library (IEL) |
subjects | Authentication behavioral characteristic representation Biometrics Feature extraction hand gesture authentication Lighting multimodal fusion Physiology Robustness spatiotemporal analysis Spatiotemporal phenomena Videos |
title | Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T09%3A29%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20and%20Accurate%20Hand%20Gesture%20Authentication%20With%20Cross-Modality%20Local-Global%20Behavior%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Zhang,%20Yufeng&rft.date=2024&rft.volume=19&rft.spage=8630&rft.epage=8643&rft.pages=8630-8643&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2024.3451367&rft_dat=%3Ccrossref_RIE%3E10_1109_TIFS_2024_3451367%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10654331&rfr_iscdi=true |