On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis

In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information forensics and security 2024-01, Vol.19, p.1-1
Hauptverfasser:	Hajra, Suvadeep, Alam, Manaar, Saha, Sayandeep, Picek, Stjepan, Mukhopadhyay, Debdeep
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational modeling Convolutional neural networks Deep learning Effectiveness Feature extraction Multi-head attention Noise measurement Recurrent neural networks Side-channel analysis Signal to noise ratio Softmax attention Stability analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE transactions on information forensics and security
container_volume	19
creator	Hajra, Suvadeep Alam, Manaar Saha, Sayandeep Picek, Stjepan Mukhopadhyay, Debdeep
description	In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.
doi_str_mv	10.1109/TIFS.2023.3326667
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2892366417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10290992</ieee_id><sourcerecordid>2892366417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</originalsourceid><addsrcrecordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2892366417</pqid></control><display><type>article</type><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creator><creatorcontrib>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creatorcontrib><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2023.3326667</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Convolutional neural networks ; Deep learning ; Effectiveness ; Feature extraction ; Multi-head attention ; Noise measurement ; Recurrent neural networks ; Side-channel analysis ; Signal to noise ratio ; Softmax attention ; Stability analysis ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</cites><orcidid>0000-0002-5535-1102 ; 0000-0002-6499-8346 ; 0000-0001-7509-4337 ; 0000-0002-2151-4321 ; 0000-0002-3338-2944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><subject>Computational modeling</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Effectiveness</subject><subject>Feature extraction</subject><subject>Multi-head attention</subject><subject>Noise measurement</subject><subject>Recurrent neural networks</subject><subject>Side-channel analysis</subject><subject>Signal to noise ratio</subject><subject>Softmax attention</subject><subject>Stability analysis</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hajra, Suvadeep</creator><creator>Alam, Manaar</creator><creator>Saha, Sayandeep</creator><creator>Picek, Stjepan</creator><creator>Mukhopadhyay, Debdeep</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid></search><sort><creationdate>20240101</creationdate><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><author>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Effectiveness</topic><topic>Feature extraction</topic><topic>Multi-head attention</topic><topic>Noise measurement</topic><topic>Recurrent neural networks</topic><topic>Side-channel analysis</topic><topic>Signal to noise ratio</topic><topic>Softmax attention</topic><topic>Stability analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hajra, Suvadeep</au><au>Alam, Manaar</au><au>Saha, Sayandeep</au><au>Picek, Stjepan</au><au>Mukhopadhyay, Debdeep</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>19</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2023.3326667</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1556-6013
ispartof	IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1
issn	1556-6013 1556-6021
language	eng
recordid	cdi_proquest_journals_2892366417
source	IEEE Electronic Library (IEL)
subjects	Computational modeling Convolutional neural networks Deep learning Effectiveness Feature extraction Multi-head attention Noise measurement Recurrent neural networks Side-channel analysis Signal to noise ratio Softmax attention Stability analysis Training
title	On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T03%3A34%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20Instability%20of%20Softmax%20Attention-based%20Deep%20Learning%20Models%20in%20Side-channel%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Hajra,%20Suvadeep&rft.date=2024-01-01&rft.volume=19&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2023.3326667&rft_dat=%3Cproquest_RIE%3E2892366417%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2892366417&rft_id=info:pmid/&rft_ieee_id=10290992&rfr_iscdi=true