PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Large Language Models (LLMs) have gained widespread use in various applications due to their powerful capability to generate human-like text. However, prompt injection attacks, which involve overwriting a model's original instructions with malicious prompts to manipulate the generated text, hav...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large Language Models (LLMs) have gained widespread use in various
applications due to their powerful capability to generate human-like text.
However, prompt injection attacks, which involve overwriting a model's original
instructions with malicious prompts to manipulate the generated text, have
raised significant concerns about the security and reliability of LLMs.
Ensuring that LLMs are robust against such attacks is crucial for their
deployment in real-world applications, particularly in critical tasks.
In this paper, we propose PROMPTFUZZ, a novel testing framework that
leverages fuzzing techniques to systematically assess the robustness of LLMs
against prompt injection attacks. Inspired by software fuzzing, PROMPTFUZZ
selects promising seed prompts and generates a diverse set of prompt injections
to evaluate the target LLM's resilience. PROMPTFUZZ operates in two stages: the
prepare phase, which involves selecting promising initial seeds and collecting
few-shot examples, and the focus phase, which uses the collected examples to
generate diverse, high-quality prompt injections. Using PROMPTFUZZ, we can
uncover more vulnerabilities in LLMs, even those with strong defense prompts.
By deploying the generated attack prompts from PROMPTFUZZ in a real-world
competition, we achieved the 7th ranking out of over 4000 participants (top
0.14%) within 2 hours. Additionally, we construct a dataset to fine-tune LLMs
for enhanced robustness against prompt injection attacks. While the fine-tuned
model shows improved robustness, PROMPTFUZZ continues to identify
vulnerabilities, highlighting the importance of robust testing for LLMs. Our
work emphasizes the critical need for effective testing tools and provides a
practical framework for evaluating and improving the robustness of LLMs against
prompt injection attacks. |
---|---|
DOI: | 10.48550/arxiv.2409.14729 |