Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that viol...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the significant advancement of Large Vision-Language Models (VLMs),
concerns about their potential misuse and abuse have grown rapidly. Previous
studies have highlighted VLMs' vulnerability to jailbreak attacks, where
carefully crafted inputs can lead the model to produce content that violates
ethical and legal standards. However, existing methods struggle against
state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content
and lack of stealthy malicious guidance. In this work, we propose a novel
jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing
inspiration from cryptography, MML utilizes an encryption-decryption process
across text and image modalities to mitigate over-exposure of malicious
information. To align the model's output with malicious intent covertly, MML
employs a technique called "evil alignment", framing the attack within a video
game production scenario. Comprehensive experiments demonstrate MML's
effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of
97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our
code is available at https://github.com/wangyu-ovo/MML |
---|---|
DOI: | 10.48550/arxiv.2412.00473 |