Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT

Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning perform...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Security and communication networks 2023-06, Vol.2023, p.1-10
Hauptverfasser:	Liu, Bowen, Xiao, Boao, Jiang, Xutong, Cen, Siyuan, He, Xin, Dou, Wanchun
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Chatbots Information systems Language Large language models Machine learning Methods Model-based systems Natural language processing Poisoning Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	10
container_issue
container_start_page	1
container_title	Security and communication networks
container_volume	2023
creator	Liu, Bowen Xiao, Boao Jiang, Xutong Cen, Siyuan He, Xin Dou, Wanchun
description	Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.
doi_str_mv	10.1155/2023/8691095
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2827112900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2827112900</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</originalsourceid><addsrcrecordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2827112900</pqid></control><display><type>article</type><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library Open Access</source><source>Alma/SFX Local Collection</source><creator>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</creator><contributor>Chen, Huaming ; Huaming Chen</contributor><creatorcontrib>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun ; Chen, Huaming ; Huaming Chen</creatorcontrib><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><identifier>ISSN: 1939-0114</identifier><identifier>EISSN: 1939-0122</identifier><identifier>DOI: 10.1155/2023/8691095</identifier><language>eng</language><publisher>London: Hindawi</publisher><subject>Algorithms ; Artificial intelligence ; Chatbots ; Information systems ; Language ; Large language models ; Machine learning ; Methods ; Model-based systems ; Natural language processing ; Poisoning ; Texts</subject><ispartof>Security and communication networks, 2023-06, Vol.2023, p.1-10</ispartof><rights>Copyright © 2023 Bowen Liu et al.</rights><rights>Copyright © 2023 Bowen Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</citedby><cites>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</cites><orcidid>0000-0003-4833-2023</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><contributor>Chen, Huaming</contributor><contributor>Huaming Chen</contributor><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><title>Security and communication networks</title><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Chatbots</subject><subject>Information systems</subject><subject>Language</subject><subject>Large language models</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Model-based systems</subject><subject>Natural language processing</subject><subject>Poisoning</subject><subject>Texts</subject><issn>1939-0114</issn><issn>1939-0122</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</recordid><startdate>20230610</startdate><enddate>20230610</enddate><creator>Liu, Bowen</creator><creator>Xiao, Boao</creator><creator>Jiang, Xutong</creator><creator>Cen, Siyuan</creator><creator>He, Xin</creator><creator>Dou, Wanchun</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid></search><sort><creationdate>20230610</creationdate><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><author>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Chatbots</topic><topic>Information systems</topic><topic>Language</topic><topic>Large language models</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Model-based systems</topic><topic>Natural language processing</topic><topic>Poisoning</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Security and communication networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Bowen</au><au>Xiao, Boao</au><au>Jiang, Xutong</au><au>Cen, Siyuan</au><au>He, Xin</au><au>Dou, Wanchun</au><au>Chen, Huaming</au><au>Huaming Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</atitle><jtitle>Security and communication networks</jtitle><date>2023-06-10</date><risdate>2023</risdate><volume>2023</volume><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>1939-0114</issn><eissn>1939-0122</eissn><abstract>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</abstract><cop>London</cop><pub>Hindawi</pub><doi>10.1155/2023/8691095</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1939-0114
ispartof	Security and communication networks, 2023-06, Vol.2023, p.1-10
issn	1939-0114 1939-0122
language	eng
recordid	cdi_proquest_journals_2827112900
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library Open Access; Alma/SFX Local Collection
subjects	Algorithms Artificial intelligence Chatbots Information systems Language Large language models Machine learning Methods Model-based systems Natural language processing Poisoning Texts
title	Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T10%3A17%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adversarial%20Attacks%20on%20Large%20Language%20Model-Based%20System%20and%20Mitigating%20Strategies:%20A%20Case%20Study%20on%20ChatGPT&rft.jtitle=Security%20and%20communication%20networks&rft.au=Liu,%20Bowen&rft.date=2023-06-10&rft.volume=2023&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=1939-0114&rft.eissn=1939-0122&rft_id=info:doi/10.1155/2023/8691095&rft_dat=%3Cproquest_cross%3E2827112900%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2827112900&rft_id=info:pmid/&rfr_iscdi=true