On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis
In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information forensics and security 2024-01, Vol.19, p.1-1 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on information forensics and security |
container_volume | 19 |
creator | Hajra, Suvadeep Alam, Manaar Saha, Sayandeep Picek, Stjepan Mukhopadhyay, Debdeep |
description | In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother. |
doi_str_mv | 10.1109/TIFS.2023.3326667 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2892366417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10290992</ieee_id><sourcerecordid>2892366417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</originalsourceid><addsrcrecordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2892366417</pqid></control><display><type>article</type><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creator><creatorcontrib>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</creatorcontrib><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2023.3326667</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Convolutional neural networks ; Deep learning ; Effectiveness ; Feature extraction ; Multi-head attention ; Noise measurement ; Recurrent neural networks ; Side-channel analysis ; Signal to noise ratio ; Softmax attention ; Stability analysis ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</cites><orcidid>0000-0002-5535-1102 ; 0000-0002-6499-8346 ; 0000-0001-7509-4337 ; 0000-0002-2151-4321 ; 0000-0002-3338-2944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10290992$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</description><subject>Computational modeling</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Effectiveness</subject><subject>Feature extraction</subject><subject>Multi-head attention</subject><subject>Noise measurement</subject><subject>Recurrent neural networks</subject><subject>Side-channel analysis</subject><subject>Signal to noise ratio</subject><subject>Softmax attention</subject><subject>Stability analysis</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFLAzEQhYMoWKs_QPAQ8Lxrkt1NN8dSrRYqPbSePIRsdmJTtkndpGD_vSkt4mmG4b03jw-he0pySol4Ws2my5wRVuRFwTjnows0oFXFM04YvfzbaXGNbkLYEFKWlNcD9LlwOK4Bz1yIqrGdjQfsDV56E7fqB49jBBetd1mjArT4GWCH56B6Z90XfvctdAFbh5e2hUyvlXPQ4bFT3SHYcIuujOoC3J3nEH1MX1aTt2y-eJ1NxvNMs1rErB0ZYIazkooKyko3WjdGa1KC5g0v21rzklbATV3wVojWiKpWRhWkqinX6TpEj6fcXe-_9xCi3Ph9n0oEmR6wgif_KKnoSaV7H0IPRu56u1X9QVIijwzlkaE8MpRnhsnzcPJYAPinZ4KIFPwL7Iht-g</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hajra, Suvadeep</creator><creator>Alam, Manaar</creator><creator>Saha, Sayandeep</creator><creator>Picek, Stjepan</creator><creator>Mukhopadhyay, Debdeep</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid></search><sort><creationdate>20240101</creationdate><title>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</title><author>Hajra, Suvadeep ; Alam, Manaar ; Saha, Sayandeep ; Picek, Stjepan ; Mukhopadhyay, Debdeep</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-d7fe2f624195e45cbccbfcc04ec6b64d8c6415e6f836d99df958afa305816cf83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Effectiveness</topic><topic>Feature extraction</topic><topic>Multi-head attention</topic><topic>Noise measurement</topic><topic>Recurrent neural networks</topic><topic>Side-channel analysis</topic><topic>Signal to noise ratio</topic><topic>Softmax attention</topic><topic>Stability analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hajra, Suvadeep</creatorcontrib><creatorcontrib>Alam, Manaar</creatorcontrib><creatorcontrib>Saha, Sayandeep</creatorcontrib><creatorcontrib>Picek, Stjepan</creatorcontrib><creatorcontrib>Mukhopadhyay, Debdeep</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hajra, Suvadeep</au><au>Alam, Manaar</au><au>Saha, Sayandeep</au><au>Picek, Stjepan</au><au>Mukhopadhyay, Debdeep</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>19</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>In side-channel analysis (SCA), Points-of-Interest (PoIs), i.e., the informative sample points remain sparsely scattered across the whole side-channel trace. Several works in the SCA literature have demonstrated that the attack efficacy could be significantly improved by combining information from the sparsely occurring PoIs. In Deep Learning (DL), a common approach for combining the information from the sparsely occurring PoIs is softmax attention. This work studies the training instability of the softmax attention-based CNN models on long traces. We show that the softmax attention-based CNN model incurs an unstable training problem when applied to longer traces (e.g., traces having a length greater than 10K sample points). We also explore the use of batch normalization and multi-head softmax attention to make the CNN models stable. Our results show that the use of a large number of batch normalization layers and/or multi-head softmax attention (replacing the vanilla softmax attention) can make the models significantly more stable, resulting in better attack efficacy. Moreover, we found our models to achieve similar or better results (up to 85% reduction in the minimum number of the required traces to reach the guessing entropy 1) than the state-of-the-art results on several synchronized and desynchronized datasets. Finally, by plotting the loss surface of the DL models, we demonstrate that using multi-head softmax attention instead of vanilla softmax attention in the CNN models can make the loss surface significantly smoother.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2023.3326667</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-5535-1102</orcidid><orcidid>https://orcid.org/0000-0002-6499-8346</orcidid><orcidid>https://orcid.org/0000-0001-7509-4337</orcidid><orcidid>https://orcid.org/0000-0002-2151-4321</orcidid><orcidid>https://orcid.org/0000-0002-3338-2944</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1556-6013 |
ispartof | IEEE transactions on information forensics and security, 2024-01, Vol.19, p.1-1 |
issn | 1556-6013 1556-6021 |
language | eng |
recordid | cdi_proquest_journals_2892366417 |
source | IEEE Electronic Library (IEL) |
subjects | Computational modeling Convolutional neural networks Deep learning Effectiveness Feature extraction Multi-head attention Noise measurement Recurrent neural networks Side-channel analysis Signal to noise ratio Softmax attention Stability analysis Training |
title | On the Instability of Softmax Attention-based Deep Learning Models in Side-channel Analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T03%3A34%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20Instability%20of%20Softmax%20Attention-based%20Deep%20Learning%20Models%20in%20Side-channel%20Analysis&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Hajra,%20Suvadeep&rft.date=2024-01-01&rft.volume=19&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2023.3326667&rft_dat=%3Cproquest_RIE%3E2892366417%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2892366417&rft_id=info:pmid/&rft_ieee_id=10290992&rfr_iscdi=true |