Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation

Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation bi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cantini, Riccardo, Cosenza, Giada, Orsino, Alessio, Talia, Domenico
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Cantini, Riccardo Cosenza, Giada Orsino, Alessio Talia, Domenico
description	Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.
doi_str_mv	10.48550/arxiv.2407.08441
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_08441</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_08441</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_084413</originalsourceid><addsrcrecordid>eNqFjkEPwVAQhN_FQfADnOwfUC0VblJCREhE3JvFajaePtl9Gv332sbdZWaSmUw-Y_pRGMTz6TQcoXy4CMZxOAvCeRxHbVMkQrBHyWrNszdW4eBuZBVOhNaWsGTU4UaIFrBDthchfMBR3PPlFe5OIFElVc4zSG4FiaIwWji5y1t9XjXgXXMCa8tX9ujZ5V3TuqNV6v28Ywab9Xm1HTaE6Uv4iVKmNWnakE7-L76MyUoT</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><source>arXiv.org</source><creator>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</creator><creatorcontrib>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</creatorcontrib><description>Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.</description><identifier>DOI: 10.48550/arxiv.2407.08441</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-07</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.08441$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08441$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cantini, Riccardo</creatorcontrib><creatorcontrib>Cosenza, Giada</creatorcontrib><creatorcontrib>Orsino, Alessio</creatorcontrib><creatorcontrib>Talia, Domenico</creatorcontrib><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><description>Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjkEPwVAQhN_FQfADnOwfUC0VblJCREhE3JvFajaePtl9Gv332sbdZWaSmUw-Y_pRGMTz6TQcoXy4CMZxOAvCeRxHbVMkQrBHyWrNszdW4eBuZBVOhNaWsGTU4UaIFrBDthchfMBR3PPlFe5OIFElVc4zSG4FiaIwWji5y1t9XjXgXXMCa8tX9ujZ5V3TuqNV6v28Ywab9Xm1HTaE6Uv4iVKmNWnakE7-L76MyUoT</recordid><startdate>20240711</startdate><enddate>20240711</enddate><creator>Cantini, Riccardo</creator><creator>Cosenza, Giada</creator><creator>Orsino, Alessio</creator><creator>Talia, Domenico</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240711</creationdate><title>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</title><author>Cantini, Riccardo ; Cosenza, Giada ; Orsino, Alessio ; Talia, Domenico</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_084413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Cantini, Riccardo</creatorcontrib><creatorcontrib>Cosenza, Giada</creatorcontrib><creatorcontrib>Orsino, Alessio</creatorcontrib><creatorcontrib>Talia, Domenico</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cantini, Riccardo</au><au>Cosenza, Giada</au><au>Orsino, Alessio</au><au>Talia, Domenico</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation</atitle><date>2024-07-11</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.</abstract><doi>10.48550/arxiv.2407.08441</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2407.08441
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2407_08441
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A59%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Are%20Large%20Language%20Models%20Really%20Bias-Free?%20Jailbreak%20Prompts%20for%20Assessing%20Adversarial%20Robustness%20to%20Bias%20Elicitation&rft.au=Cantini,%20Riccardo&rft.date=2024-07-11&rft_id=info:doi/10.48550/arxiv.2407.08441&rft_dat=%3Carxiv_GOX%3E2407_08441%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true