Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis

Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information forensics and security 2024, Vol.19, p.8630-8643
Hauptverfasser: Zhang, Yufeng, Kang, Wenxiong, Song, Wenwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 8643
container_issue
container_start_page 8630
container_title IEEE transactions on information forensics and security
container_volume 19
creator Zhang, Yufeng
Kang, Wenxiong
Song, Wenwei
description Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .
doi_str_mv 10.1109/TIFS.2024.3451367
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2024_3451367</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10654331</ieee_id><sourcerecordid>10_1109_TIFS_2024_3451367</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</originalsourceid><addsrcrecordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creator><creatorcontrib>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creatorcontrib><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2024.3451367</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>IEEE</publisher><subject>Authentication ; behavioral characteristic representation ; Biometrics ; Feature extraction ; hand gesture authentication ; Lighting ; multimodal fusion ; Physiology ; Robustness ; spatiotemporal analysis ; Spatiotemporal phenomena ; Videos</subject><ispartof>IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</cites><orcidid>0000-0001-9023-7252 ; 0000-0002-7787-6604 ; 0000-0001-6639-555X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><subject>Authentication</subject><subject>behavioral characteristic representation</subject><subject>Biometrics</subject><subject>Feature extraction</subject><subject>hand gesture authentication</subject><subject>Lighting</subject><subject>multimodal fusion</subject><subject>Physiology</subject><subject>Robustness</subject><subject>spatiotemporal analysis</subject><subject>Spatiotemporal phenomena</subject><subject>Videos</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhang, Yufeng</creator><creator>Kang, Wenxiong</creator><creator>Song, Wenwei</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></search><sort><creationdate>2024</creationdate><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><author>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Authentication</topic><topic>behavioral characteristic representation</topic><topic>Biometrics</topic><topic>Feature extraction</topic><topic>hand gesture authentication</topic><topic>Lighting</topic><topic>multimodal fusion</topic><topic>Physiology</topic><topic>Robustness</topic><topic>spatiotemporal analysis</topic><topic>Spatiotemporal phenomena</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yufeng</au><au>Kang, Wenxiong</au><au>Song, Wenwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024</date><risdate>2024</risdate><volume>19</volume><spage>8630</spage><epage>8643</epage><pages>8630-8643</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</abstract><pub>IEEE</pub><doi>10.1109/TIFS.2024.3451367</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1556-6013
ispartof IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643
issn 1556-6013
1556-6021
language eng
recordid cdi_crossref_primary_10_1109_TIFS_2024_3451367
source IEEE Electronic Library (IEL)
subjects Authentication
behavioral characteristic representation
Biometrics
Feature extraction
hand gesture authentication
Lighting
multimodal fusion
Physiology
Robustness
spatiotemporal analysis
Spatiotemporal phenomena
Videos
title Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T09%3A29%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20and%20Accurate%20Hand%20Gesture%20Authentication%20With%20Cross-Modality%20Local-Global%20Behavior%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Zhang,%20Yufeng&rft.date=2024&rft.volume=19&rft.spage=8630&rft.epage=8643&rft.pages=8630-8643&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2024.3451367&rft_dat=%3Ccrossref_RIE%3E10_1109_TIFS_2024_3451367%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10654331&rfr_iscdi=true