Text-Based Prompt Injection Attack Using Mathematical Functions in Modern Large Language Models

Prompt injection is a type of attack that induces violent or discriminatory responses via the input of a prompt containing illegal instructions to the large language model (LLM). Most early injection attacks used simple text prompts; however, recently, injection attacks employing elaborately designe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2024-12, Vol.13 (24), p.5008
Hauptverfasser: Kwon, Hyeokjin, Pak, Wooguil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Prompt injection is a type of attack that induces violent or discriminatory responses via the input of a prompt containing illegal instructions to the large language model (LLM). Most early injection attacks used simple text prompts; however, recently, injection attacks employing elaborately designed prompts to overcome the strong security policies of modern LLMs have been applied to input prompts. This study proposed a method to perform injection attacks that can bypass existing security policies via the replacement of sensitive words that may be rejected by a language model in the text prompt with mathematical functions. By hiding the contents of the prompt so that the LLM cannot easily detect the contents of the illegal instructions, we achieved a considerably higher success rate than existing injection attacks, even for the latest securely aligned LLMs. As the proposed method employed only text prompts, it was capable of attacking most LLMs. Moreover, it exhibited a higher attack success rate than multimodal attacks using images despite using only text. An understanding of the newly proposed injection attack is expected to aid in the development of methods to further strengthen the security of current LLMs.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics13245008