Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis

Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information forensics and security 2024, Vol.19, p.8630-8643
Hauptverfasser:	Zhang, Yufeng, Kang, Wenxiong, Song, Wenwei
Format:	Artikel
Sprache:	eng
Schlagworte:	Authentication behavioral characteristic representation Biometrics Feature extraction hand gesture authentication Lighting multimodal fusion Physiology Robustness spatiotemporal analysis Spatiotemporal phenomena Videos
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	8643
container_issue
container_start_page	8630
container_title	IEEE transactions on information forensics and security
container_volume	19
creator	Zhang, Yufeng Kang, Wenxiong Song, Wenwei
description	Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .
doi_str_mv	10.1109/TIFS.2024.3451367
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2024_3451367</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10654331</ieee_id><sourcerecordid>10_1109_TIFS_2024_3451367</sourcerecordid><originalsourceid>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</originalsourceid><addsrcrecordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creator><creatorcontrib>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</creatorcontrib><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2024.3451367</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>IEEE</publisher><subject>Authentication ; behavioral characteristic representation ; Biometrics ; Feature extraction ; hand gesture authentication ; Lighting ; multimodal fusion ; Physiology ; Robustness ; spatiotemporal analysis ; Spatiotemporal phenomena ; Videos</subject><ispartof>IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</cites><orcidid>0000-0001-9023-7252 ; 0000-0002-7787-6604 ; 0000-0001-6639-555X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10654331$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</description><subject>Authentication</subject><subject>behavioral characteristic representation</subject><subject>Biometrics</subject><subject>Feature extraction</subject><subject>hand gesture authentication</subject><subject>Lighting</subject><subject>multimodal fusion</subject><subject>Physiology</subject><subject>Robustness</subject><subject>spatiotemporal analysis</subject><subject>Spatiotemporal phenomena</subject><subject>Videos</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkNtKw0AQhhdRsFYfQPBiXyB1J3tKLmOxB6gIWvEybLYTuhKzsrsR-vYaWsSrf36Ybxg-Qm6BzQBYeb9dL15nOcvFjAsJXOkzMgEpVaZYDud_M_BLchXjB2NCgCompHnxzRATNf2OVtYOwSSkq7EtMaYhIK2GtMc-OWuS8z19d2lP58HHmD35nelcOtCNt6bLlp1vTEcfcG--nQ-06k13iC5ek4vWdBFvTjklb4vH7XyVbZ6X63m1ySyIImXcFnmrleVMqLJU2kgphdClFYrrQoLGVuaaA4C0mhVG464xbVNii0JZ1HxK4HjXjt8FbOuv4D5NONTA6lFSPUqqR0n1SdIvc3dkHCL-21dScA78B9ilY6Y</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Zhang, Yufeng</creator><creator>Kang, Wenxiong</creator><creator>Song, Wenwei</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></search><sort><creationdate>2024</creationdate><title>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</title><author>Zhang, Yufeng ; Kang, Wenxiong ; Song, Wenwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c148t-3c82f76c30469967a5554479c46378517ef52731115c708a7edbafb9efe46ce73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Authentication</topic><topic>behavioral characteristic representation</topic><topic>Biometrics</topic><topic>Feature extraction</topic><topic>hand gesture authentication</topic><topic>Lighting</topic><topic>multimodal fusion</topic><topic>Physiology</topic><topic>Robustness</topic><topic>spatiotemporal analysis</topic><topic>Spatiotemporal phenomena</topic><topic>Videos</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yufeng</creatorcontrib><creatorcontrib>Kang, Wenxiong</creatorcontrib><creatorcontrib>Song, Wenwei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yufeng</au><au>Kang, Wenxiong</au><au>Song, Wenwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024</date><risdate>2024</risdate><volume>19</volume><spage>8630</spage><epage>8643</epage><pages>8630-8643</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>Obtaining robust fine-grained behavioral features is critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them more difficult to capture than physiological characteristics. Moreover, various illumination and backgrounds in practical applications pose additional challenges to existing methods because commonly used RGB videos are sensitive to them. To overcome this robustness limitation, we propose a two-stream CNN-based cross-modality local-global network (CMLG-Net) with two complementary modules to enhance the discriminability and robustness of behavioral features. First, we introduce a temporal scale pyramid (TSP) module consisting of multiple parallel convolution subbranches with different temporal kernel sizes to capture the fine-grained local motion cues at various temporal scales. Second, a cross-modality temporal non-local (CMTNL) module is devised to simultaneously aggregate the global temporal features and cross-modality features with an attention mechanism. Through the complementary combination of the TSP and CMTNL modules, our CMLG-Net obtains a comprehensive and robust behavioral representation that contains both multi-scale (short- and long-term) and multimodal (RGB-D) behavioral information. Extensive experiments are conducted on the largest dataset, SCUT-DHGA, and a simulated practical dataset, SCUT-DHGA-br, to demonstrate the effectiveness of CMLG-Net in exploiting fine-grained behavioral features and complementary multimodal information. Finally, it achieves stat-of-the-art performance with the lowest ERR of 0.497% and 4.848% in two challenging evaluation protocols and shows significant superiority in robustness under practical scenes with unsatisfactory illumination and backgrounds. The code is available at https://github.com/SCUT-BIP-Lab/CMLG-Net .</abstract><pub>IEEE</pub><doi>10.1109/TIFS.2024.3451367</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9023-7252</orcidid><orcidid>https://orcid.org/0000-0002-7787-6604</orcidid><orcidid>https://orcid.org/0000-0001-6639-555X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1556-6013
ispartof	IEEE transactions on information forensics and security, 2024, Vol.19, p.8630-8643
issn	1556-6013 1556-6021
language	eng
recordid	cdi_crossref_primary_10_1109_TIFS_2024_3451367
source	IEEE Electronic Library (IEL)
subjects	Authentication behavioral characteristic representation Biometrics Feature extraction hand gesture authentication Lighting multimodal fusion Physiology Robustness spatiotemporal analysis Spatiotemporal phenomena Videos
title	Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T09%3A29%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20and%20Accurate%20Hand%20Gesture%20Authentication%20With%20Cross-Modality%20Local-Global%20Behavior%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Zhang,%20Yufeng&rft.date=2024&rft.volume=19&rft.spage=8630&rft.epage=8643&rft.pages=8630-8643&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2024.3451367&rft_dat=%3Ccrossref_RIE%3E10_1109_TIFS_2024_3451367%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10654331&rfr_iscdi=true