Realistic Evaluation of Toxicity in Large Language Models

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. Whil...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-05
Hauptverfasser:	Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Nguyen, Thien Huu
Format:	Artikel
Sprache:	eng
Schlagworte:	Large language models Toxicity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Tinh Son Luong Thanh-Thien Le Linh Ngo Van Nguyen, Thien Huu
description	Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3057514354</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3057514354</sourcerecordid><originalsourceid>FETCH-proquest_journals_30575143543</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDEpNzMksLslMVnAtS8wpTSzJzM9TyE9TCMmvyEzOLKlUyMxT8EksSk8FknnppYlAhm9-SmpOMQ8Da1piTnEqL5TmZlB2cw1x9tAtKMovLE0tLonPyi8tygNKxRsbmJqbGpoYm5oYE6cKAGQYNhw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3057514354</pqid></control><display><type>article</type><title>Realistic Evaluation of Toxicity in Large Language Models</title><source>Free E- Journals</source><creator>Tinh Son Luong ; Thanh-Thien Le ; Linh Ngo Van ; Nguyen, Thien Huu</creator><creatorcontrib>Tinh Son Luong ; Thanh-Thien Le ; Linh Ngo Van ; Nguyen, Thien Huu</creatorcontrib><description>Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Large language models ; Toxicity</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Tinh Son Luong</creatorcontrib><creatorcontrib>Thanh-Thien Le</creatorcontrib><creatorcontrib>Linh Ngo Van</creatorcontrib><creatorcontrib>Nguyen, Thien Huu</creatorcontrib><title>Realistic Evaluation of Toxicity in Large Language Models</title><title>arXiv.org</title><description>Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.</description><subject>Large language models</subject><subject>Toxicity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwDEpNzMksLslMVnAtS8wpTSzJzM9TyE9TCMmvyEzOLKlUyMxT8EksSk8FknnppYlAhm9-SmpOMQ8Da1piTnEqL5TmZlB2cw1x9tAtKMovLE0tLonPyi8tygNKxRsbmJqbGpoYm5oYE6cKAGQYNhw</recordid><startdate>20240520</startdate><enddate>20240520</enddate><creator>Tinh Son Luong</creator><creator>Thanh-Thien Le</creator><creator>Linh Ngo Van</creator><creator>Nguyen, Thien Huu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240520</creationdate><title>Realistic Evaluation of Toxicity in Large Language Models</title><author>Tinh Son Luong ; Thanh-Thien Le ; Linh Ngo Van ; Nguyen, Thien Huu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30575143543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Large language models</topic><topic>Toxicity</topic><toplevel>online_resources</toplevel><creatorcontrib>Tinh Son Luong</creatorcontrib><creatorcontrib>Thanh-Thien Le</creatorcontrib><creatorcontrib>Linh Ngo Van</creatorcontrib><creatorcontrib>Nguyen, Thien Huu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tinh Son Luong</au><au>Thanh-Thien Le</au><au>Linh Ngo Van</au><au>Nguyen, Thien Huu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Realistic Evaluation of Toxicity in Large Language Models</atitle><jtitle>arXiv.org</jtitle><date>2024-05-20</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3057514354
source	Free E- Journals
subjects	Large language models Toxicity
title	Realistic Evaluation of Toxicity in Large Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T13%3A55%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Realistic%20Evaluation%20of%20Toxicity%20in%20Large%20Language%20Models&rft.jtitle=arXiv.org&rft.au=Tinh%20Son%20Luong&rft.date=2024-05-20&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3057514354%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3057514354&rft_id=info:pmid/&rfr_iscdi=true