Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT

Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning perform...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Security and communication networks 2023-06, Vol.2023, p.1-10
Hauptverfasser: Liu, Bowen, Xiao, Boao, Jiang, Xutong, Cen, Siyuan, He, Xin, Dou, Wanchun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10
container_issue
container_start_page 1
container_title Security and communication networks
container_volume 2023
creator Liu, Bowen
Xiao, Boao
Jiang, Xutong
Cen, Siyuan
He, Xin
Dou, Wanchun
description Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.
doi_str_mv 10.1155/2023/8691095
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2827112900</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2827112900</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</originalsourceid><addsrcrecordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2827112900</pqid></control><display><type>article</type><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library Open Access</source><source>Alma/SFX Local Collection</source><creator>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</creator><contributor>Chen, Huaming ; Huaming Chen</contributor><creatorcontrib>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun ; Chen, Huaming ; Huaming Chen</creatorcontrib><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><identifier>ISSN: 1939-0114</identifier><identifier>EISSN: 1939-0122</identifier><identifier>DOI: 10.1155/2023/8691095</identifier><language>eng</language><publisher>London: Hindawi</publisher><subject>Algorithms ; Artificial intelligence ; Chatbots ; Information systems ; Language ; Large language models ; Machine learning ; Methods ; Model-based systems ; Natural language processing ; Poisoning ; Texts</subject><ispartof>Security and communication networks, 2023-06, Vol.2023, p.1-10</ispartof><rights>Copyright © 2023 Bowen Liu et al.</rights><rights>Copyright © 2023 Bowen Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</citedby><cites>FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</cites><orcidid>0000-0003-4833-2023</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><contributor>Chen, Huaming</contributor><contributor>Huaming Chen</contributor><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><title>Security and communication networks</title><description>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Chatbots</subject><subject>Information systems</subject><subject>Language</subject><subject>Large language models</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Model-based systems</subject><subject>Natural language processing</subject><subject>Poisoning</subject><subject>Texts</subject><issn>1939-0114</issn><issn>1939-0122</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kMtKw0AUhgdRsFZ3PsCAS42dWy7jLgatQotC6zqcZCbp9JLUmUmlb29Ki0s35_wcPv4DH0K3lDxSGoYjRhgfJZGkRIZnaEAllwGhjJ3_ZSou0ZVzS0IiKmIxQKtU7bR1YA2sceo9lCuH2wZPwNa6n03dQR-mrdLr4BmcVni2d15vMDQKT403NXjT1HjmLXhdG-2ecIqznuxPndofyrIF-PHn_BpdVLB2-ua0h-jr9WWevQWTj_F7lk6CkiYsDLTiOmZxLKkQikcAVcETXspQkDImqqpUVAgoBQsTLqtCFoJqroHIWIuEJJIP0d2xd2vb7047ny_bzjb9y5wlLKaUSUJ66uFIlbZ1zuoq31qzAbvPKckPOvODzvyks8fvj_jCNAp-zP_0L-KQcxM</recordid><startdate>20230610</startdate><enddate>20230610</enddate><creator>Liu, Bowen</creator><creator>Xiao, Boao</creator><creator>Jiang, Xutong</creator><creator>Cen, Siyuan</creator><creator>He, Xin</creator><creator>Dou, Wanchun</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid></search><sort><creationdate>20230610</creationdate><title>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</title><author>Liu, Bowen ; Xiao, Boao ; Jiang, Xutong ; Cen, Siyuan ; He, Xin ; Dou, Wanchun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1825-ed3e72779144d36aafb383c9540c70dffd6b4ac425839fb9b41e3ea097e480893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Chatbots</topic><topic>Information systems</topic><topic>Language</topic><topic>Large language models</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Model-based systems</topic><topic>Natural language processing</topic><topic>Poisoning</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Bowen</creatorcontrib><creatorcontrib>Xiao, Boao</creatorcontrib><creatorcontrib>Jiang, Xutong</creatorcontrib><creatorcontrib>Cen, Siyuan</creatorcontrib><creatorcontrib>He, Xin</creatorcontrib><creatorcontrib>Dou, Wanchun</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Security and communication networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Bowen</au><au>Xiao, Boao</au><au>Jiang, Xutong</au><au>Cen, Siyuan</au><au>He, Xin</au><au>Dou, Wanchun</au><au>Chen, Huaming</au><au>Huaming Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT</atitle><jtitle>Security and communication networks</jtitle><date>2023-06-10</date><risdate>2023</risdate><volume>2023</volume><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>1939-0114</issn><eissn>1939-0122</eissn><abstract>Machine learning algorithms are at the forefront of the development of advanced information systems. The rapid progress in machine learning technology has enabled cutting-edge large language models (LLMs), represented by GPT-3 and ChatGPT, to perform a wide range of NLP tasks with a stunning performance. However, research on adversarial machine learning highlights the need for these intelligent systems to be more robust. Adversarial machine learning aims to evaluate attack and defense mechanisms to prevent the malicious exploitation of these systems. In the case of ChatGPT, adversarial induction prompt can cause the model to generate toxic texts that could pose serious security risks or propagate false information. To address this challenge, we first analyze the effectiveness of inducing attacks on ChatGPT. Then, two effective mitigating mechanisms are proposed. The first is a training-free prefix prompt mechanism to detect and prevent the generation of toxic texts. The second is a RoBERTa-based mechanism that identifies manipulative or misleading input text via external detection models. The availability of this method is demonstrated through experiments.</abstract><cop>London</cop><pub>Hindawi</pub><doi>10.1155/2023/8691095</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-4833-2023</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1939-0114
ispartof Security and communication networks, 2023-06, Vol.2023, p.1-10
issn 1939-0114
1939-0122
language eng
recordid cdi_proquest_journals_2827112900
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library Open Access; Alma/SFX Local Collection
subjects Algorithms
Artificial intelligence
Chatbots
Information systems
Language
Large language models
Machine learning
Methods
Model-based systems
Natural language processing
Poisoning
Texts
title Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T10%3A17%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adversarial%20Attacks%20on%20Large%20Language%20Model-Based%20System%20and%20Mitigating%20Strategies:%20A%20Case%20Study%20on%20ChatGPT&rft.jtitle=Security%20and%20communication%20networks&rft.au=Liu,%20Bowen&rft.date=2023-06-10&rft.volume=2023&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=1939-0114&rft.eissn=1939-0122&rft_id=info:doi/10.1155/2023/8691095&rft_dat=%3Cproquest_cross%3E2827112900%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2827112900&rft_id=info:pmid/&rfr_iscdi=true