Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that viol...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Yu, Zhou, Xiaofei, Wang, Yichen, Zhang, Geyuan, He, Tianxing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Yu Zhou, Xiaofei Wang, Yichen Zhang, Geyuan He, Tianxing
description	With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML
doi_str_mv	10.48550/arxiv.2412.00473
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_00473</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_00473</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_004733</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwMDE35mRw8krMzEkqSk3MVvBJLEpPVQjLLM7Mz9P1ScxLL00E8n3zU1JzihVCMoryS9MzFHxLc0oydYGCiTkKPpl52UAlPAysaYk5xam8UJqbQd7NNcTZQxdsW3xBUWZuYlFlPMjWeLCtxoRVAAAv3zbZ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Jailbreak Large Vision-Language Models Through Multi-Modal Linkage</title><source>arXiv.org</source><creator>Wang, Yu ; Zhou, Xiaofei ; Wang, Yichen ; Zhang, Geyuan ; He, Tianxing</creator><creatorcontrib>Wang, Yu ; Zhou, Xiaofei ; Wang, Yichen ; Zhang, Geyuan ; He, Tianxing</creatorcontrib><description>With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML</description><identifier>DOI: 10.48550/arxiv.2412.00473</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.00473$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.00473$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Zhou, Xiaofei</creatorcontrib><creatorcontrib>Wang, Yichen</creatorcontrib><creatorcontrib>Zhang, Geyuan</creatorcontrib><creatorcontrib>He, Tianxing</creatorcontrib><title>Jailbreak Large Vision-Language Models Through Multi-Modal Linkage</title><description>With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwMDE35mRw8krMzEkqSk3MVvBJLEpPVQjLLM7Mz9P1ScxLL00E8n3zU1JzihVCMoryS9MzFHxLc0oydYGCiTkKPpl52UAlPAysaYk5xam8UJqbQd7NNcTZQxdsW3xBUWZuYlFlPMjWeLCtxoRVAAAv3zbZ</recordid><startdate>20241130</startdate><enddate>20241130</enddate><creator>Wang, Yu</creator><creator>Zhou, Xiaofei</creator><creator>Wang, Yichen</creator><creator>Zhang, Geyuan</creator><creator>He, Tianxing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241130</creationdate><title>Jailbreak Large Vision-Language Models Through Multi-Modal Linkage</title><author>Wang, Yu ; Zhou, Xiaofei ; Wang, Yichen ; Zhang, Geyuan ; He, Tianxing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_004733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Zhou, Xiaofei</creatorcontrib><creatorcontrib>Wang, Yichen</creatorcontrib><creatorcontrib>Zhang, Geyuan</creatorcontrib><creatorcontrib>He, Tianxing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Yu</au><au>Zhou, Xiaofei</au><au>Wang, Yichen</au><au>Zhang, Geyuan</au><au>He, Tianxing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Jailbreak Large Vision-Language Models Through Multi-Modal Linkage</atitle><date>2024-11-30</date><risdate>2024</risdate><abstract>With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML</abstract><doi>10.48550/arxiv.2412.00473</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.00473
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_00473
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T15%3A42%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Jailbreak%20Large%20Vision-Language%20Models%20Through%20Multi-Modal%20Linkage&rft.au=Wang,%20Yu&rft.date=2024-11-30&rft_id=info:doi/10.48550/arxiv.2412.00473&rft_dat=%3Carxiv_GOX%3E2412_00473%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true