Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT
Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning perform...
Gespeichert in:
Veröffentlicht in: | Security and communication networks 2023-06, Vol.2023, p.1-10 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 10 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | Security and communication networks |
container_volume | 2023 |
creator | Liu, Bowen Xiao, Boao Jiang, Xutong Cen, Siyuan He, Xin Dou, Wanchun |
description | Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments. |
doi_str_mv | 10.1155/2023/8691095 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2827112900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2827112900</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</originalsourceid><addsrcrecordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2827112900</pqid></control><display><type>article</type><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library Open Access</source><source>Alma/SFX Local Collection</source><creator>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</creator><contributor>Chen, Huaming ; Huaming Chen</contributor><creatorcontrib>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun ; Chen, Huaming ; Huaming Chen</creatorcontrib><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><identifier>ISSN: 1939-0114</identifier><identifier>EISSN: 1939-0122</identifier><identifier>DOI: 10.1155/2023/8691095</identifier><language>eng</language><publisher>London: Hindawi</publisher><subject>Algorithms ; Artificial intelligence ; Chatbots ; Information systems ; Language ; Large language models ; Machine learning ; Methods ; Model-based systems ; Natural language processing ; Poisoning ; Texts</subject><ispartof>Security and communication networks, 2023-06, Vol.2023, p.1-10</ispartof><rights>Copyright © 2023 Bowen Liu et al.</rights><rights>Copyright © 2023 Bowen Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</citedby><cites>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</cites><orcidid>0000-0003-4833-2023</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><contributor>Chen, Huaming</contributor><contributor>Huaming Chen</contributor><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><title>Security and communication networks</title><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Chatbots</subject><subject>Information systems</subject><subject>Language</subject><subject>Large language models</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Model-based systems</subject><subject>Natural language processing</subject><subject>Poisoning</subject><subject>Texts</subject><issn>1939-0114</issn><issn>1939-0122</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</recordid><startdate>20230610</startdate><enddate>20230610</enddate><creator>Liu, Bowen</creator><creator>Xiao, Boao</creator><creator>Jiang, Xutong</creator><creator>Cen, Siyuan</creator><creator>He, Xin</creator><creator>Dou, Wanchun</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid></search><sort><creationdate>20230610</creationdate><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><author>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Chatbots</topic><topic>Information systems</topic><topic>Language</topic><topic>Large language models</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Model-based systems</topic><topic>Natural language processing</topic><topic>Poisoning</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Security and communication networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Bowen</au><au>Xiao, Boao</au><au>Jiang, Xutong</au><au>Cen, Siyuan</au><au>He, Xin</au><au>Dou, Wanchun</au><au>Chen, Huaming</au><au>Huaming Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</atitle><jtitle>Security and communication networks</jtitle><date>2023-06-10</date><risdate>2023</risdate><volume>2023</volume><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>1939-0114</issn><eissn>1939-0122</eissn><abstract>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</abstract><cop>London</cop><pub>Hindawi</pub><doi>10.1155/2023/8691095</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1939-0114 |
ispartof | Security and communication networks, 2023-06, Vol.2023, p.1-10 |
issn | 1939-0114 1939-0122 |
language | eng |
recordid | cdi_proquest_journals_2827112900 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library Open Access; Alma/SFX Local Collection |
subjects | Algorithms Artificial intelligence Chatbots Information systems Language Large language models Machine learning Methods Model-based systems Natural language processing Poisoning Texts |
title | Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T10%3A17%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adversarial%20Attacks%20on%20Large%20Language%20Model-Based%20System%20and%20Mitigating%20Strategies:%20A%20Case%20Study%20on%20ChatGPT&rft.jtitle=Security%20and%20communication%20networks&rft.au=Liu,%20Bowen&rft.date=2023-06-10&rft.volume=2023&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=1939-0114&rft.eissn=1939-0122&rft_id=info:doi/10.1155/2023/8691095&rft_dat=%3Cproquest_cross%3E2827112900%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2827112900&rft_id=info:pmid/&rfr_iscdi=true |