Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation bi...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Cantini, Riccardo Cosenza, Giada Orsino, Alessio Talia, Domenico |
description | Large Language Models (LLMs) have revolutionized artificial intelligence,
demonstrating remarkable computational power and linguistic capabilities.
However, these models are inherently prone to various biases stemming from
their training data. These include selection, linguistic, and confirmation
biases, along with common stereotypes related to gender, ethnicity, sexual
orientation, religion, socioeconomic status, disability, and age. This study
explores the presence of these biases within the responses given by the most
recent LLMs, analyzing the impact on their fairness and reliability. We also
investigate how known prompt engineering techniques can be exploited to
effectively reveal hidden biases of LLMs, testing their adversarial robustness
against jailbreak prompts specially crafted for bias elicitation. Extensive
experiments are conducted using the most widespread LLMs at different scales,
confirming that LLMs can still be manipulated to produce biased or
inappropriate responses, despite their advanced capabilities and sophisticated
alignment processes. Our findings underscore the importance of enhancing
mitigation techniques to address these safety issues, toward a more sustainable
and inclusive artificial intelligence. |
doi_str_mv | 10.48550/arxiv.2407.08441 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_08441</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_08441</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_084413</originalsourceid><addsrcrecordid>eNqFjkEPwVAQhN_FQfADnOwfUC0VblJCREhE3JvFajaePtl9Gv332sbdZWaSmUw-Y_pRGMTz6TQcoXy4CMZxOAvCeRxHbVMkQrBHyWrNszdW4eBuZBVOhNaWsGTU4UaIFrBDthchfMBR3PPlFe5OIFElVc4zSG4FiaIwWji5y1t9XjXgXXMCa8tX9ujZ5V3TuqNV6v28Ywab9Xm1HTaE6Uv4iVKmNWnakE7-L76MyUoT</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><source>arXiv.org</source><creator>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</creator><creatorcontrib>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</creatorcontrib><description>Large Language Models (LLMs) have revolutionized artificial intelligence,
demonstrating remarkable computational power and linguistic capabilities.
However, these models are inherently prone to various biases stemming from
their training data. These include selection, linguistic, and confirmation
biases, along with common stereotypes related to gender, ethnicity, sexual
orientation, religion, socioeconomic status, disability, and age. This study
explores the presence of these biases within the responses given by the most
recent LLMs, analyzing the impact on their fairness and reliability. We also
investigate how known prompt engineering techniques can be exploited to
effectively reveal hidden biases of LLMs, testing their adversarial robustness
against jailbreak prompts specially crafted for bias elicitation. Extensive
experiments are conducted using the most widespread LLMs at different scales,
confirming that LLMs can still be manipulated to produce biased or
inappropriate responses, despite their advanced capabilities and sophisticated
alignment processes. Our findings underscore the importance of enhancing
mitigation techniques to address these safety issues, toward a more sustainable
and inclusive artificial intelligence.</description><identifier>DOI: 10.48550/arxiv.2407.08441</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-07</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.08441$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08441$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cantini, Riccardo</creatorcontrib><creatorcontrib>Cosenza, Giada</creatorcontrib><creatorcontrib>Orsino, Alessio</creatorcontrib><creatorcontrib>Talia, Domenico</creatorcontrib><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><description>Large Language Models (LLMs) have revolutionized artificial intelligence,
demonstrating remarkable computational power and linguistic capabilities.
However, these models are inherently prone to various biases stemming from
their training data. These include selection, linguistic, and confirmation
biases, along with common stereotypes related to gender, ethnicity, sexual
orientation, religion, socioeconomic status, disability, and age. This study
explores the presence of these biases within the responses given by the most
recent LLMs, analyzing the impact on their fairness and reliability. We also
investigate how known prompt engineering techniques can be exploited to
effectively reveal hidden biases of LLMs, testing their adversarial robustness
against jailbreak prompts specially crafted for bias elicitation. Extensive
experiments are conducted using the most widespread LLMs at different scales,
confirming that LLMs can still be manipulated to produce biased or
inappropriate responses, despite their advanced capabilities and sophisticated
alignment processes. Our findings underscore the importance of enhancing
mitigation techniques to address these safety issues, toward a more sustainable
and inclusive artificial intelligence.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjkEPwVAQhN_FQfADnOwfUC0VblJCREhE3JvFajaePtl9Gv332sbdZWaSmUw-Y_pRGMTz6TQcoXy4CMZxOAvCeRxHbVMkQrBHyWrNszdW4eBuZBVOhNaWsGTU4UaIFrBDthchfMBR3PPlFe5OIFElVc4zSG4FiaIwWji5y1t9XjXgXXMCa8tX9ujZ5V3TuqNV6v28Ywab9Xm1HTaE6Uv4iVKmNWnakE7-L76MyUoT</recordid><startdate>20240711</startdate><enddate>20240711</enddate><creator>Cantini, Riccardo</creator><creator>Cosenza, Giada</creator><creator>Orsino, Alessio</creator><creator>Talia, Domenico</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240711</creationdate><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><author>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_084413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Cantini, Riccardo</creatorcontrib><creatorcontrib>Cosenza, Giada</creatorcontrib><creatorcontrib>Orsino, Alessio</creatorcontrib><creatorcontrib>Talia, Domenico</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cantini, Riccardo</au><au>Cosenza, Giada</au><au>Orsino, Alessio</au><au>Talia, Domenico</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</atitle><date>2024-07-11</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) have revolutionized artificial intelligence,
demonstrating remarkable computational power and linguistic capabilities.
However, these models are inherently prone to various biases stemming from
their training data. These include selection, linguistic, and confirmation
biases, along with common stereotypes related to gender, ethnicity, sexual
orientation, religion, socioeconomic status, disability, and age. This study
explores the presence of these biases within the responses given by the most
recent LLMs, analyzing the impact on their fairness and reliability. We also
investigate how known prompt engineering techniques can be exploited to
effectively reveal hidden biases of LLMs, testing their adversarial robustness
against jailbreak prompts specially crafted for bias elicitation. Extensive
experiments are conducted using the most widespread LLMs at different scales,
confirming that LLMs can still be manipulated to produce biased or
inappropriate responses, despite their advanced capabilities and sophisticated
alignment processes. Our findings underscore the importance of enhancing
mitigation techniques to address these safety issues, toward a more sustainable
and inclusive artificial intelligence.</abstract><doi>10.48550/arxiv.2407.08441</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2407.08441 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2407_08441 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A59%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Are%20Large%20Language%20Models%20Really%20Bias-Free?%20Jailbreak%20Prompts%20for%20Assessing%20Adversarial%20Robustness%20to%20Bias%20Elicitation&rft.au=Cantini,%20Riccardo&rft.date=2024-07-11&rft_id=info:doi/10.48550/arxiv.2407.08441&rft_dat=%3Carxiv_GOX%3E2407_08441%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |