On Adversarial Examples for Text Classification by Perturbing Latent Representations

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunatel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sooksatra, Korn, Khanal, Bikram, Rivas, Pablo
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Cryptography and Security Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Sooksatra, Korn Khanal, Bikram Rivas, Pablo
description	Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
doi_str_mv	10.48550/arxiv.2405.03789
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2405_03789</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2405_03789</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-6cae510ecff07801395380e1a561f0f9184504576b54de44077f3708f4e022a3</originalsourceid><addsrcrecordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrizJko_BpD9gSGh9N2tnVQSOYyQ1OG_f1u1p5jAMfIw9CMiV1RqeMCz-khcKdA7S2OqWtfuJb48XChGDx5HvFjzNI0XuzoG3tCRejxijd37A5M8T76_8QCF9hd5Pn7zBRFPi7zQHij9t3cQ7duNwjHT_nxv28bxr69es2b-81dsmw9JUWTkgaQE0OAfGgpCVlhZIoC6FA1cJqzQobcpeqyMpBcY4acA6RVAUKDfs8e91VXVz8CcM1-5X1606-Q16FUph</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Adversarial Examples for Text Classification by Perturbing Latent Representations</title><source>arXiv.org</source><creator>Sooksatra, Korn ; Khanal, Bikram ; Rivas, Pablo</creator><creatorcontrib>Sooksatra, Korn ; Khanal, Bikram ; Rivas, Pablo</creatorcontrib><description>Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.</description><identifier>DOI: 10.48550/arxiv.2405.03789</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Cryptography and Security ; Computer Science - Learning</subject><creationdate>2024-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2405.03789$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2405.03789$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sooksatra, Korn</creatorcontrib><creatorcontrib>Khanal, Bikram</creatorcontrib><creatorcontrib>Rivas, Pablo</creatorcontrib><title>On Adversarial Examples for Text Classification by Perturbing Latent Representations</title><description>Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrizJko_BpD9gSGh9N2tnVQSOYyQ1OG_f1u1p5jAMfIw9CMiV1RqeMCz-khcKdA7S2OqWtfuJb48XChGDx5HvFjzNI0XuzoG3tCRejxijd37A5M8T76_8QCF9hd5Pn7zBRFPi7zQHij9t3cQ7duNwjHT_nxv28bxr69es2b-81dsmw9JUWTkgaQE0OAfGgpCVlhZIoC6FA1cJqzQobcpeqyMpBcY4acA6RVAUKDfs8e91VXVz8CcM1-5X1606-Q16FUph</recordid><startdate>20240506</startdate><enddate>20240506</enddate><creator>Sooksatra, Korn</creator><creator>Khanal, Bikram</creator><creator>Rivas, Pablo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240506</creationdate><title>On Adversarial Examples for Text Classification by Perturbing Latent Representations</title><author>Sooksatra, Korn ; Khanal, Bikram ; Rivas, Pablo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-6cae510ecff07801395380e1a561f0f9184504576b54de44077f3708f4e022a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Sooksatra, Korn</creatorcontrib><creatorcontrib>Khanal, Bikram</creatorcontrib><creatorcontrib>Rivas, Pablo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sooksatra, Korn</au><au>Khanal, Bikram</au><au>Rivas, Pablo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Adversarial Examples for Text Classification by Perturbing Latent Representations</atitle><date>2024-05-06</date><risdate>2024</risdate><abstract>Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.</abstract><doi>10.48550/arxiv.2405.03789</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2405.03789
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2405_03789
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Cryptography and Security Computer Science - Learning
title	On Adversarial Examples for Text Classification by Perturbing Latent Representations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A37%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Adversarial%20Examples%20for%20Text%20Classification%20by%20Perturbing%20Latent%20Representations&rft.au=Sooksatra,%20Korn&rft.date=2024-05-06&rft_id=info:doi/10.48550/arxiv.2405.03789&rft_dat=%3Carxiv_GOX%3E2405_03789%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true