On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis

In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information forensics and security 2024-01, Vol.19, p.1-1
Hauptverfasser: Hajra, Suvadeep, Alam, Manaar, Saha, Sayandeep, Picek, Stjepan, Mukhopadhyay, Debdeep
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE transactions on information forensics and security
container_volume 19
creator Hajra, Suvadeep
Alam, Manaar
Saha, Sayandeep
Picek, Stjepan
Mukhopadhyay, Debdeep
description In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.
doi_str_mv 10.1109/TIFS.2023.3326667
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2892366417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10290992</ieee_id><sourcerecordid>2892366417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</originalsourceid><addsrcrecordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2892366417</pqid></control><display><type>article</type><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creator><creatorcontrib>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creatorcontrib><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2023.3326667</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Convolutional neural networks ; Deep learning ; Effectiveness ; Feature extraction ; Multi-head attention ; Noise measurement ; Recurrent neural networks ; Side-channel analysis ; Signal to noise ratio ; Softmax attention ; Stability analysis ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</cites><orcidid>0000-0002-5535-1102 ; 0000-0002-6499-8346 ; 0000-0001-7509-4337 ; 0000-0002-2151-4321 ; 0000-0002-3338-2944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><subject>Computational modeling</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Effectiveness</subject><subject>Feature extraction</subject><subject>Multi-head attention</subject><subject>Noise measurement</subject><subject>Recurrent neural networks</subject><subject>Side-channel analysis</subject><subject>Signal to noise ratio</subject><subject>Softmax attention</subject><subject>Stability analysis</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hajra, Suvadeep</creator><creator>Alam, Manaar</creator><creator>Saha, Sayandeep</creator><creator>Picek, Stjepan</creator><creator>Mukhopadhyay, Debdeep</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid></search><sort><creationdate>20240101</creationdate><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><author>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Effectiveness</topic><topic>Feature extraction</topic><topic>Multi-head attention</topic><topic>Noise measurement</topic><topic>Recurrent neural networks</topic><topic>Side-channel analysis</topic><topic>Signal to noise ratio</topic><topic>Softmax attention</topic><topic>Stability analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hajra, Suvadeep</au><au>Alam, Manaar</au><au>Saha, Sayandeep</au><au>Picek, Stjepan</au><au>Mukhopadhyay, Debdeep</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>19</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2023.3326667</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1556-6013
ispartof IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1
issn 1556-6013
1556-6021
language eng
recordid cdi_proquest_journals_2892366417
source IEEE Electronic Library (IEL)
subjects Computational modeling
Convolutional neural networks
Deep learning
Effectiveness
Feature extraction
Multi-head attention
Noise measurement
Recurrent neural networks
Side-channel analysis
Signal to noise ratio
Softmax attention
Stability analysis
Training
title On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T03%3A34%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20Instability%20of%20Softmax%20Attention-based%20Deep%20Learning%20Models%20in%20Side-channel%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Hajra,%20Suvadeep&rft.date=2024-01-01&rft.volume=19&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2023.3326667&rft_dat=%3Cproquest_RIE%3E2892366417%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2892366417&rft_id=info:pmid/&rft_ieee_id=10290992&rfr_iscdi=true