Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challeng...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-12
Hauptverfasser: Teng, Ma, Jia Xiaojun, Duan Ranjie, Li, Xinfeng, Huang Yihao, Chu Zhixuan, Liu, Yang, Ren Wenqi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Teng, Ma
Jia Xiaojun
Duan Ranjie
Li, Xinfeng
Huang Yihao
Chu Zhixuan
Liu, Yang
Ren Wenqi
description With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challenges, especially given the distinct protective measures implemented across various modalities in commercial models. Previous works concentrate risks into a single modality, resulting in limited jailbreak performance. In this paper, we propose a heuristic-induced multimodal risk distribution jailbreak attack method, called HIMRD, which consists of two elements: multimodal risk distribution strategy and heuristic-induced search strategy. The multimodal risk distribution strategy is used to segment harmful instructions across multiple modalities to effectively circumvent MLLMs' security protection. The heuristic-induced search strategy identifies two types of prompts: the understanding-enhancing prompt, which helps the MLLM reconstruct the malicious prompt, and the inducing prompt, which increases the likelihood of affirmative outputs over refusals, enabling a successful jailbreak attack. Extensive experiments demonstrate that this approach effectively uncovers vulnerabilities in MLLMs, achieving an average attack success rate of 90% across seven popular open-source MLLMs and an average attack success rate of around 68% in three popular closed-source MLLMs. Our code will coming soon. Warning: This paper contains offensive and harmful examples, reader discretion is advised.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3142728423</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3142728423</sourcerecordid><originalsourceid>FETCH-proquest_journals_31427284233</originalsourceid><addsrcrecordid>eNqNjj0LwjAYhIMgWLT_IeBcaN-01lX8oIpdpHtJm7SkjYnm4_-bwcHR5e7guYNboAgIyZJ9DrBCsbVTmqawK6EoSIRYxb0R1ok-uSrme85w7aUTT82oxA9hZ3wK2IjOO6EVvlEhO8PpjA_O0X7Ggza_izs1Iw-qRk9DqDXj0m7QcqDS8vjra7S9nJtjlbyMfntuXTtpb1RALclyKCF8JeS_1gc7iUZq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3142728423</pqid></control><display><type>article</type><title>Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models</title><source>Freely Accessible Journals</source><creator>Teng, Ma ; Jia Xiaojun ; Duan Ranjie ; Li, Xinfeng ; Huang Yihao ; Chu Zhixuan ; Liu, Yang ; Ren Wenqi</creator><creatorcontrib>Teng, Ma ; Jia Xiaojun ; Duan Ranjie ; Li, Xinfeng ; Huang Yihao ; Chu Zhixuan ; Liu, Yang ; Ren Wenqi</creatorcontrib><description>With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challenges, especially given the distinct protective measures implemented across various modalities in commercial models. Previous works concentrate risks into a single modality, resulting in limited jailbreak performance. In this paper, we propose a heuristic-induced multimodal risk distribution jailbreak attack method, called HIMRD, which consists of two elements: multimodal risk distribution strategy and heuristic-induced search strategy. The multimodal risk distribution strategy is used to segment harmful instructions across multiple modalities to effectively circumvent MLLMs' security protection. The heuristic-induced search strategy identifies two types of prompts: the understanding-enhancing prompt, which helps the MLLM reconstruct the malicious prompt, and the inducing prompt, which increases the likelihood of affirmative outputs over refusals, enabling a successful jailbreak attack. Extensive experiments demonstrate that this approach effectively uncovers vulnerabilities in MLLMs, achieving an average attack success rate of 90% across seven popular open-source MLLMs and an average attack success rate of around 68% in three popular closed-source MLLMs. Our code will coming soon. Warning: This paper contains offensive and harmful examples, reader discretion is advised.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Heuristic ; Large language models ; Risk ; Search methods ; Security ; Source code</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Teng, Ma</creatorcontrib><creatorcontrib>Jia Xiaojun</creatorcontrib><creatorcontrib>Duan Ranjie</creatorcontrib><creatorcontrib>Li, Xinfeng</creatorcontrib><creatorcontrib>Huang Yihao</creatorcontrib><creatorcontrib>Chu Zhixuan</creatorcontrib><creatorcontrib>Liu, Yang</creatorcontrib><creatorcontrib>Ren Wenqi</creatorcontrib><title>Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models</title><title>arXiv.org</title><description>With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challenges, especially given the distinct protective measures implemented across various modalities in commercial models. Previous works concentrate risks into a single modality, resulting in limited jailbreak performance. In this paper, we propose a heuristic-induced multimodal risk distribution jailbreak attack method, called HIMRD, which consists of two elements: multimodal risk distribution strategy and heuristic-induced search strategy. The multimodal risk distribution strategy is used to segment harmful instructions across multiple modalities to effectively circumvent MLLMs' security protection. The heuristic-induced search strategy identifies two types of prompts: the understanding-enhancing prompt, which helps the MLLM reconstruct the malicious prompt, and the inducing prompt, which increases the likelihood of affirmative outputs over refusals, enabling a successful jailbreak attack. Extensive experiments demonstrate that this approach effectively uncovers vulnerabilities in MLLMs, achieving an average attack success rate of 90% across seven popular open-source MLLMs and an average attack success rate of around 68% in three popular closed-source MLLMs. Our code will coming soon. Warning: This paper contains offensive and harmful examples, reader discretion is advised.</description><subject>Heuristic</subject><subject>Large language models</subject><subject>Risk</subject><subject>Search methods</subject><subject>Security</subject><subject>Source code</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjj0LwjAYhIMgWLT_IeBcaN-01lX8oIpdpHtJm7SkjYnm4_-bwcHR5e7guYNboAgIyZJ9DrBCsbVTmqawK6EoSIRYxb0R1ok-uSrme85w7aUTT82oxA9hZ3wK2IjOO6EVvlEhO8PpjA_O0X7Ggza_izs1Iw-qRk9DqDXj0m7QcqDS8vjra7S9nJtjlbyMfntuXTtpb1RALclyKCF8JeS_1gc7iUZq</recordid><startdate>20241208</startdate><enddate>20241208</enddate><creator>Teng, Ma</creator><creator>Jia Xiaojun</creator><creator>Duan Ranjie</creator><creator>Li, Xinfeng</creator><creator>Huang Yihao</creator><creator>Chu Zhixuan</creator><creator>Liu, Yang</creator><creator>Ren Wenqi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241208</creationdate><title>Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models</title><author>Teng, Ma ; Jia Xiaojun ; Duan Ranjie ; Li, Xinfeng ; Huang Yihao ; Chu Zhixuan ; Liu, Yang ; Ren Wenqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31427284233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Heuristic</topic><topic>Large language models</topic><topic>Risk</topic><topic>Search methods</topic><topic>Security</topic><topic>Source code</topic><toplevel>online_resources</toplevel><creatorcontrib>Teng, Ma</creatorcontrib><creatorcontrib>Jia Xiaojun</creatorcontrib><creatorcontrib>Duan Ranjie</creatorcontrib><creatorcontrib>Li, Xinfeng</creatorcontrib><creatorcontrib>Huang Yihao</creatorcontrib><creatorcontrib>Chu Zhixuan</creatorcontrib><creatorcontrib>Liu, Yang</creatorcontrib><creatorcontrib>Ren Wenqi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teng, Ma</au><au>Jia Xiaojun</au><au>Duan Ranjie</au><au>Li, Xinfeng</au><au>Huang Yihao</au><au>Chu Zhixuan</au><au>Liu, Yang</au><au>Ren Wenqi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models</atitle><jtitle>arXiv.org</jtitle><date>2024-12-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challenges, especially given the distinct protective measures implemented across various modalities in commercial models. Previous works concentrate risks into a single modality, resulting in limited jailbreak performance. In this paper, we propose a heuristic-induced multimodal risk distribution jailbreak attack method, called HIMRD, which consists of two elements: multimodal risk distribution strategy and heuristic-induced search strategy. The multimodal risk distribution strategy is used to segment harmful instructions across multiple modalities to effectively circumvent MLLMs' security protection. The heuristic-induced search strategy identifies two types of prompts: the understanding-enhancing prompt, which helps the MLLM reconstruct the malicious prompt, and the inducing prompt, which increases the likelihood of affirmative outputs over refusals, enabling a successful jailbreak attack. Extensive experiments demonstrate that this approach effectively uncovers vulnerabilities in MLLMs, achieving an average attack success rate of 90% across seven popular open-source MLLMs and an average attack success rate of around 68% in three popular closed-source MLLMs. Our code will coming soon. Warning: This paper contains offensive and harmful examples, reader discretion is advised.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_3142728423
source Freely Accessible Journals
subjects Heuristic
Large language models
Risk
Search methods
Security
Source code
title Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T08%3A22%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Heuristic-Induced%20Multimodal%20Risk%20Distribution%20Jailbreak%20Attack%20for%20Multimodal%20Large%20Language%20Models&rft.jtitle=arXiv.org&rft.au=Teng,%20Ma&rft.date=2024-12-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3142728423%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3142728423&rft_id=info:pmid/&rfr_iscdi=true