Introducing v0.5 of the AI Safety Benchmark from MLCommons

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constru...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Vidgen, Bertie, Agrawal, Adarsh, Ahmed, Ahmed M, Akinwande, Victor, Al-Nuaimi, Namir, Alfaraj, Najla, Alhajjar, Elie, Aroyo, Lora, Bavalatti, Trupti, Bartolo, Max, Blili-Hamelin, Borhane, Bollacker, Kurt, Bomassani, Rishi, Boston, Marisa Ferrara, Campos, Siméon, Chakra, Kal, Chen, Canyu, Coleman, Cody, Coudert, Zacharie Delpierre, Derczynski, Leon, Dutta, Debojyoti, Eisenberg, Ian, Ezick, James, Frase, Heather, Fuller, Brian, Gandikota, Ram, Gangavarapu, Agasthya, Gangavarapu, Ananya, Gealy, James, Ghosh, Rajat, Goel, James, Gohar, Usman, Goswami, Sujata, Hale, Scott A, Hutiri, Wiebke, Imperial, Joseph Marvin, Jandial, Surgan, Judd, Nick, Juefei-Xu, Felix, Khomh, Foutse, Kailkhura, Bhavya, Kirk, Hannah Rose, Klyman, Kevin, Knotz, Chris, Kuchnik, Michael, Kumar, Shachi H, Kumar, Srijan, Lengerich, Chris, Li, Bo, Liao, Zeyi, Long, Eileen Peters, Lu, Victor, Luger, Sarah, Mai, Yifan, Mammen, Priyanka Mary, Manyeki, Kelvin, McGregor, Sean, Mehta, Virendra, Mohammed, Shafee, Moss, Emanuel, Nachman, Lama, Naganna, Dinesh Jinenhally, Nikanjam, Amin, Nushi, Besmira, Oala, Luis, Orr, Iftach, Parrish, Alicia, Patlak, Cigdem, Pietri, William, Poursabzi-Sangdeh, Forough, Presani, Eleonora, Puletti, Fabrizio, Röttger, Paul, Sahay, Saurav, Santos, Tim, Scherrer, Nino, Sebag, Alice Schoenauer, Schramowski, Patrick, Shahbazi, Abolfazl, Sharma, Vin, Shen, Xudong, Sistla, Vamsi, Tang, Leonard, Testuggine, Davide, Thangarasa, Vithursan, Watkins, Elizabeth Anne, Weiss, Rebecca, Welty, Chris, Wilbers, Tyler, Williams, Adina, Wu, Carole-Jean, Yadav, Poonam, Yang, Xianjun, Zeng, Yi, Zhang, Wenhui, Zhdanov, Fedor, Zhu, Jiacheng, Liang, Percy, Mattson, Peter, Vanschoren, Joaquin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Vidgen, Bertie
Agrawal, Adarsh
Ahmed, Ahmed M
Akinwande, Victor
Al-Nuaimi, Namir
Alfaraj, Najla
Alhajjar, Elie
Aroyo, Lora
Bavalatti, Trupti
Bartolo, Max
Blili-Hamelin, Borhane
Bollacker, Kurt
Bomassani, Rishi
Boston, Marisa Ferrara
Campos, Siméon
Chakra, Kal
Chen, Canyu
Coleman, Cody
Coudert, Zacharie Delpierre
Derczynski, Leon
Dutta, Debojyoti
Eisenberg, Ian
Ezick, James
Frase, Heather
Fuller, Brian
Gandikota, Ram
Gangavarapu, Agasthya
Gangavarapu, Ananya
Gealy, James
Ghosh, Rajat
Goel, James
Gohar, Usman
Goswami, Sujata
Hale, Scott A
Hutiri, Wiebke
Imperial, Joseph Marvin
Jandial, Surgan
Judd, Nick
Juefei-Xu, Felix
Khomh, Foutse
Kailkhura, Bhavya
Kirk, Hannah Rose
Klyman, Kevin
Knotz, Chris
Kuchnik, Michael
Kumar, Shachi H
Kumar, Srijan
Lengerich, Chris
Li, Bo
Liao, Zeyi
Long, Eileen Peters
Lu, Victor
Luger, Sarah
Mai, Yifan
Mammen, Priyanka Mary
Manyeki, Kelvin
McGregor, Sean
Mehta, Virendra
Mohammed, Shafee
Moss, Emanuel
Nachman, Lama
Naganna, Dinesh Jinenhally
Nikanjam, Amin
Nushi, Besmira
Oala, Luis
Orr, Iftach
Parrish, Alicia
Patlak, Cigdem
Pietri, William
Poursabzi-Sangdeh, Forough
Presani, Eleonora
Puletti, Fabrizio
Röttger, Paul
Sahay, Saurav
Santos, Tim
Scherrer, Nino
Sebag, Alice Schoenauer
Schramowski, Patrick
Shahbazi, Abolfazl
Sharma, Vin
Shen, Xudong
Sistla, Vamsi
Tang, Leonard
Testuggine, Davide
Thangarasa, Vithursan
Watkins, Elizabeth Anne
Weiss, Rebecca
Welty, Chris
Wilbers, Tyler
Williams, Adina
Wu, Carole-Jean
Yadav, Poonam
Yang, Xianjun
Zeng, Yi
Zhang, Wenhui
Zhdanov, Fedor
Zhu, Jiacheng
Liang, Percy
Mattson, Peter
Vanschoren, Joaquin
description This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
doi_str_mv 10.48550/arxiv.2404.12241
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_12241</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_12241</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-8f65a8fe6e7c3df5ce51520cfb710737477d0a547b5eb20cffc17ca2206ab3d23</originalsourceid><addsrcrecordid>eNotj7tOw0AQRbehQIEPoGJ_wGZf47HoEouHJSMK0lvj9Q6xEnujjYnI30MC1ZVOcXWOEHda5a4EUA-UvodjbpxyuTbG6WvxWE9ziv2XH6ZPeVQ5yMhy3gS5rOUHcZhPchUmvxkpbSWnOMq3porjGKfDjbhi2h3C7f8uxPr5aV29Zs37S10tm4wK1FnJBVDJoQjobc_gA2gwynOHWqFFh9grAocdhO7M2Wv0ZIwqqLO9sQtx_3d7kW_3afh1ObXniPYSYX8AVS1Aew</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><source>arXiv.org</source><creator>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</creator><creatorcontrib>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</creatorcontrib><description>This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.</description><identifier>DOI: 10.48550/arxiv.2404.12241</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.12241$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.12241$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Vidgen, Bertie</creatorcontrib><creatorcontrib>Agrawal, Adarsh</creatorcontrib><creatorcontrib>Ahmed, Ahmed M</creatorcontrib><creatorcontrib>Akinwande, Victor</creatorcontrib><creatorcontrib>Al-Nuaimi, Namir</creatorcontrib><creatorcontrib>Alfaraj, Najla</creatorcontrib><creatorcontrib>Alhajjar, Elie</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Bavalatti, Trupti</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Blili-Hamelin, Borhane</creatorcontrib><creatorcontrib>Bollacker, Kurt</creatorcontrib><creatorcontrib>Bomassani, Rishi</creatorcontrib><creatorcontrib>Boston, Marisa Ferrara</creatorcontrib><creatorcontrib>Campos, Siméon</creatorcontrib><creatorcontrib>Chakra, Kal</creatorcontrib><creatorcontrib>Chen, Canyu</creatorcontrib><creatorcontrib>Coleman, Cody</creatorcontrib><creatorcontrib>Coudert, Zacharie Delpierre</creatorcontrib><creatorcontrib>Derczynski, Leon</creatorcontrib><creatorcontrib>Dutta, Debojyoti</creatorcontrib><creatorcontrib>Eisenberg, Ian</creatorcontrib><creatorcontrib>Ezick, James</creatorcontrib><creatorcontrib>Frase, Heather</creatorcontrib><creatorcontrib>Fuller, Brian</creatorcontrib><creatorcontrib>Gandikota, Ram</creatorcontrib><creatorcontrib>Gangavarapu, Agasthya</creatorcontrib><creatorcontrib>Gangavarapu, Ananya</creatorcontrib><creatorcontrib>Gealy, James</creatorcontrib><creatorcontrib>Ghosh, Rajat</creatorcontrib><creatorcontrib>Goel, James</creatorcontrib><creatorcontrib>Gohar, Usman</creatorcontrib><creatorcontrib>Goswami, Sujata</creatorcontrib><creatorcontrib>Hale, Scott A</creatorcontrib><creatorcontrib>Hutiri, Wiebke</creatorcontrib><creatorcontrib>Imperial, Joseph Marvin</creatorcontrib><creatorcontrib>Jandial, Surgan</creatorcontrib><creatorcontrib>Judd, Nick</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Khomh, Foutse</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Kirk, Hannah Rose</creatorcontrib><creatorcontrib>Klyman, Kevin</creatorcontrib><creatorcontrib>Knotz, Chris</creatorcontrib><creatorcontrib>Kuchnik, Michael</creatorcontrib><creatorcontrib>Kumar, Shachi H</creatorcontrib><creatorcontrib>Kumar, Srijan</creatorcontrib><creatorcontrib>Lengerich, Chris</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Liao, Zeyi</creatorcontrib><creatorcontrib>Long, Eileen Peters</creatorcontrib><creatorcontrib>Lu, Victor</creatorcontrib><creatorcontrib>Luger, Sarah</creatorcontrib><creatorcontrib>Mai, Yifan</creatorcontrib><creatorcontrib>Mammen, Priyanka Mary</creatorcontrib><creatorcontrib>Manyeki, Kelvin</creatorcontrib><creatorcontrib>McGregor, Sean</creatorcontrib><creatorcontrib>Mehta, Virendra</creatorcontrib><creatorcontrib>Mohammed, Shafee</creatorcontrib><creatorcontrib>Moss, Emanuel</creatorcontrib><creatorcontrib>Nachman, Lama</creatorcontrib><creatorcontrib>Naganna, Dinesh Jinenhally</creatorcontrib><creatorcontrib>Nikanjam, Amin</creatorcontrib><creatorcontrib>Nushi, Besmira</creatorcontrib><creatorcontrib>Oala, Luis</creatorcontrib><creatorcontrib>Orr, Iftach</creatorcontrib><creatorcontrib>Parrish, Alicia</creatorcontrib><creatorcontrib>Patlak, Cigdem</creatorcontrib><creatorcontrib>Pietri, William</creatorcontrib><creatorcontrib>Poursabzi-Sangdeh, Forough</creatorcontrib><creatorcontrib>Presani, Eleonora</creatorcontrib><creatorcontrib>Puletti, Fabrizio</creatorcontrib><creatorcontrib>Röttger, Paul</creatorcontrib><creatorcontrib>Sahay, Saurav</creatorcontrib><creatorcontrib>Santos, Tim</creatorcontrib><creatorcontrib>Scherrer, Nino</creatorcontrib><creatorcontrib>Sebag, Alice Schoenauer</creatorcontrib><creatorcontrib>Schramowski, Patrick</creatorcontrib><creatorcontrib>Shahbazi, Abolfazl</creatorcontrib><creatorcontrib>Sharma, Vin</creatorcontrib><creatorcontrib>Shen, Xudong</creatorcontrib><creatorcontrib>Sistla, Vamsi</creatorcontrib><creatorcontrib>Tang, Leonard</creatorcontrib><creatorcontrib>Testuggine, Davide</creatorcontrib><creatorcontrib>Thangarasa, Vithursan</creatorcontrib><creatorcontrib>Watkins, Elizabeth Anne</creatorcontrib><creatorcontrib>Weiss, Rebecca</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><creatorcontrib>Wilbers, Tyler</creatorcontrib><creatorcontrib>Williams, Adina</creatorcontrib><creatorcontrib>Wu, Carole-Jean</creatorcontrib><creatorcontrib>Yadav, Poonam</creatorcontrib><creatorcontrib>Yang, Xianjun</creatorcontrib><creatorcontrib>Zeng, Yi</creatorcontrib><creatorcontrib>Zhang, Wenhui</creatorcontrib><creatorcontrib>Zhdanov, Fedor</creatorcontrib><creatorcontrib>Zhu, Jiacheng</creatorcontrib><creatorcontrib>Liang, Percy</creatorcontrib><creatorcontrib>Mattson, Peter</creatorcontrib><creatorcontrib>Vanschoren, Joaquin</creatorcontrib><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><description>This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOw0AQRbehQIEPoGJ_wGZf47HoEouHJSMK0lvj9Q6xEnujjYnI30MC1ZVOcXWOEHda5a4EUA-UvodjbpxyuTbG6WvxWE9ziv2XH6ZPeVQ5yMhy3gS5rOUHcZhPchUmvxkpbSWnOMq3porjGKfDjbhi2h3C7f8uxPr5aV29Zs37S10tm4wK1FnJBVDJoQjobc_gA2gwynOHWqFFh9grAocdhO7M2Wv0ZIwqqLO9sQtx_3d7kW_3afh1ObXniPYSYX8AVS1Aew</recordid><startdate>20240418</startdate><enddate>20240418</enddate><creator>Vidgen, Bertie</creator><creator>Agrawal, Adarsh</creator><creator>Ahmed, Ahmed M</creator><creator>Akinwande, Victor</creator><creator>Al-Nuaimi, Namir</creator><creator>Alfaraj, Najla</creator><creator>Alhajjar, Elie</creator><creator>Aroyo, Lora</creator><creator>Bavalatti, Trupti</creator><creator>Bartolo, Max</creator><creator>Blili-Hamelin, Borhane</creator><creator>Bollacker, Kurt</creator><creator>Bomassani, Rishi</creator><creator>Boston, Marisa Ferrara</creator><creator>Campos, Siméon</creator><creator>Chakra, Kal</creator><creator>Chen, Canyu</creator><creator>Coleman, Cody</creator><creator>Coudert, Zacharie Delpierre</creator><creator>Derczynski, Leon</creator><creator>Dutta, Debojyoti</creator><creator>Eisenberg, Ian</creator><creator>Ezick, James</creator><creator>Frase, Heather</creator><creator>Fuller, Brian</creator><creator>Gandikota, Ram</creator><creator>Gangavarapu, Agasthya</creator><creator>Gangavarapu, Ananya</creator><creator>Gealy, James</creator><creator>Ghosh, Rajat</creator><creator>Goel, James</creator><creator>Gohar, Usman</creator><creator>Goswami, Sujata</creator><creator>Hale, Scott A</creator><creator>Hutiri, Wiebke</creator><creator>Imperial, Joseph Marvin</creator><creator>Jandial, Surgan</creator><creator>Judd, Nick</creator><creator>Juefei-Xu, Felix</creator><creator>Khomh, Foutse</creator><creator>Kailkhura, Bhavya</creator><creator>Kirk, Hannah Rose</creator><creator>Klyman, Kevin</creator><creator>Knotz, Chris</creator><creator>Kuchnik, Michael</creator><creator>Kumar, Shachi H</creator><creator>Kumar, Srijan</creator><creator>Lengerich, Chris</creator><creator>Li, Bo</creator><creator>Liao, Zeyi</creator><creator>Long, Eileen Peters</creator><creator>Lu, Victor</creator><creator>Luger, Sarah</creator><creator>Mai, Yifan</creator><creator>Mammen, Priyanka Mary</creator><creator>Manyeki, Kelvin</creator><creator>McGregor, Sean</creator><creator>Mehta, Virendra</creator><creator>Mohammed, Shafee</creator><creator>Moss, Emanuel</creator><creator>Nachman, Lama</creator><creator>Naganna, Dinesh Jinenhally</creator><creator>Nikanjam, Amin</creator><creator>Nushi, Besmira</creator><creator>Oala, Luis</creator><creator>Orr, Iftach</creator><creator>Parrish, Alicia</creator><creator>Patlak, Cigdem</creator><creator>Pietri, William</creator><creator>Poursabzi-Sangdeh, Forough</creator><creator>Presani, Eleonora</creator><creator>Puletti, Fabrizio</creator><creator>Röttger, Paul</creator><creator>Sahay, Saurav</creator><creator>Santos, Tim</creator><creator>Scherrer, Nino</creator><creator>Sebag, Alice Schoenauer</creator><creator>Schramowski, Patrick</creator><creator>Shahbazi, Abolfazl</creator><creator>Sharma, Vin</creator><creator>Shen, Xudong</creator><creator>Sistla, Vamsi</creator><creator>Tang, Leonard</creator><creator>Testuggine, Davide</creator><creator>Thangarasa, Vithursan</creator><creator>Watkins, Elizabeth Anne</creator><creator>Weiss, Rebecca</creator><creator>Welty, Chris</creator><creator>Wilbers, Tyler</creator><creator>Williams, Adina</creator><creator>Wu, Carole-Jean</creator><creator>Yadav, Poonam</creator><creator>Yang, Xianjun</creator><creator>Zeng, Yi</creator><creator>Zhang, Wenhui</creator><creator>Zhdanov, Fedor</creator><creator>Zhu, Jiacheng</creator><creator>Liang, Percy</creator><creator>Mattson, Peter</creator><creator>Vanschoren, Joaquin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240418</creationdate><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><author>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-8f65a8fe6e7c3df5ce51520cfb710737477d0a547b5eb20cffc17ca2206ab3d23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Vidgen, Bertie</creatorcontrib><creatorcontrib>Agrawal, Adarsh</creatorcontrib><creatorcontrib>Ahmed, Ahmed M</creatorcontrib><creatorcontrib>Akinwande, Victor</creatorcontrib><creatorcontrib>Al-Nuaimi, Namir</creatorcontrib><creatorcontrib>Alfaraj, Najla</creatorcontrib><creatorcontrib>Alhajjar, Elie</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Bavalatti, Trupti</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Blili-Hamelin, Borhane</creatorcontrib><creatorcontrib>Bollacker, Kurt</creatorcontrib><creatorcontrib>Bomassani, Rishi</creatorcontrib><creatorcontrib>Boston, Marisa Ferrara</creatorcontrib><creatorcontrib>Campos, Siméon</creatorcontrib><creatorcontrib>Chakra, Kal</creatorcontrib><creatorcontrib>Chen, Canyu</creatorcontrib><creatorcontrib>Coleman, Cody</creatorcontrib><creatorcontrib>Coudert, Zacharie Delpierre</creatorcontrib><creatorcontrib>Derczynski, Leon</creatorcontrib><creatorcontrib>Dutta, Debojyoti</creatorcontrib><creatorcontrib>Eisenberg, Ian</creatorcontrib><creatorcontrib>Ezick, James</creatorcontrib><creatorcontrib>Frase, Heather</creatorcontrib><creatorcontrib>Fuller, Brian</creatorcontrib><creatorcontrib>Gandikota, Ram</creatorcontrib><creatorcontrib>Gangavarapu, Agasthya</creatorcontrib><creatorcontrib>Gangavarapu, Ananya</creatorcontrib><creatorcontrib>Gealy, James</creatorcontrib><creatorcontrib>Ghosh, Rajat</creatorcontrib><creatorcontrib>Goel, James</creatorcontrib><creatorcontrib>Gohar, Usman</creatorcontrib><creatorcontrib>Goswami, Sujata</creatorcontrib><creatorcontrib>Hale, Scott A</creatorcontrib><creatorcontrib>Hutiri, Wiebke</creatorcontrib><creatorcontrib>Imperial, Joseph Marvin</creatorcontrib><creatorcontrib>Jandial, Surgan</creatorcontrib><creatorcontrib>Judd, Nick</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Khomh, Foutse</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Kirk, Hannah Rose</creatorcontrib><creatorcontrib>Klyman, Kevin</creatorcontrib><creatorcontrib>Knotz, Chris</creatorcontrib><creatorcontrib>Kuchnik, Michael</creatorcontrib><creatorcontrib>Kumar, Shachi H</creatorcontrib><creatorcontrib>Kumar, Srijan</creatorcontrib><creatorcontrib>Lengerich, Chris</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Liao, Zeyi</creatorcontrib><creatorcontrib>Long, Eileen Peters</creatorcontrib><creatorcontrib>Lu, Victor</creatorcontrib><creatorcontrib>Luger, Sarah</creatorcontrib><creatorcontrib>Mai, Yifan</creatorcontrib><creatorcontrib>Mammen, Priyanka Mary</creatorcontrib><creatorcontrib>Manyeki, Kelvin</creatorcontrib><creatorcontrib>McGregor, Sean</creatorcontrib><creatorcontrib>Mehta, Virendra</creatorcontrib><creatorcontrib>Mohammed, Shafee</creatorcontrib><creatorcontrib>Moss, Emanuel</creatorcontrib><creatorcontrib>Nachman, Lama</creatorcontrib><creatorcontrib>Naganna, Dinesh Jinenhally</creatorcontrib><creatorcontrib>Nikanjam, Amin</creatorcontrib><creatorcontrib>Nushi, Besmira</creatorcontrib><creatorcontrib>Oala, Luis</creatorcontrib><creatorcontrib>Orr, Iftach</creatorcontrib><creatorcontrib>Parrish, Alicia</creatorcontrib><creatorcontrib>Patlak, Cigdem</creatorcontrib><creatorcontrib>Pietri, William</creatorcontrib><creatorcontrib>Poursabzi-Sangdeh, Forough</creatorcontrib><creatorcontrib>Presani, Eleonora</creatorcontrib><creatorcontrib>Puletti, Fabrizio</creatorcontrib><creatorcontrib>Röttger, Paul</creatorcontrib><creatorcontrib>Sahay, Saurav</creatorcontrib><creatorcontrib>Santos, Tim</creatorcontrib><creatorcontrib>Scherrer, Nino</creatorcontrib><creatorcontrib>Sebag, Alice Schoenauer</creatorcontrib><creatorcontrib>Schramowski, Patrick</creatorcontrib><creatorcontrib>Shahbazi, Abolfazl</creatorcontrib><creatorcontrib>Sharma, Vin</creatorcontrib><creatorcontrib>Shen, Xudong</creatorcontrib><creatorcontrib>Sistla, Vamsi</creatorcontrib><creatorcontrib>Tang, Leonard</creatorcontrib><creatorcontrib>Testuggine, Davide</creatorcontrib><creatorcontrib>Thangarasa, Vithursan</creatorcontrib><creatorcontrib>Watkins, Elizabeth Anne</creatorcontrib><creatorcontrib>Weiss, Rebecca</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><creatorcontrib>Wilbers, Tyler</creatorcontrib><creatorcontrib>Williams, Adina</creatorcontrib><creatorcontrib>Wu, Carole-Jean</creatorcontrib><creatorcontrib>Yadav, Poonam</creatorcontrib><creatorcontrib>Yang, Xianjun</creatorcontrib><creatorcontrib>Zeng, Yi</creatorcontrib><creatorcontrib>Zhang, Wenhui</creatorcontrib><creatorcontrib>Zhdanov, Fedor</creatorcontrib><creatorcontrib>Zhu, Jiacheng</creatorcontrib><creatorcontrib>Liang, Percy</creatorcontrib><creatorcontrib>Mattson, Peter</creatorcontrib><creatorcontrib>Vanschoren, Joaquin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vidgen, Bertie</au><au>Agrawal, Adarsh</au><au>Ahmed, Ahmed M</au><au>Akinwande, Victor</au><au>Al-Nuaimi, Namir</au><au>Alfaraj, Najla</au><au>Alhajjar, Elie</au><au>Aroyo, Lora</au><au>Bavalatti, Trupti</au><au>Bartolo, Max</au><au>Blili-Hamelin, Borhane</au><au>Bollacker, Kurt</au><au>Bomassani, Rishi</au><au>Boston, Marisa Ferrara</au><au>Campos, Siméon</au><au>Chakra, Kal</au><au>Chen, Canyu</au><au>Coleman, Cody</au><au>Coudert, Zacharie Delpierre</au><au>Derczynski, Leon</au><au>Dutta, Debojyoti</au><au>Eisenberg, Ian</au><au>Ezick, James</au><au>Frase, Heather</au><au>Fuller, Brian</au><au>Gandikota, Ram</au><au>Gangavarapu, Agasthya</au><au>Gangavarapu, Ananya</au><au>Gealy, James</au><au>Ghosh, Rajat</au><au>Goel, James</au><au>Gohar, Usman</au><au>Goswami, Sujata</au><au>Hale, Scott A</au><au>Hutiri, Wiebke</au><au>Imperial, Joseph Marvin</au><au>Jandial, Surgan</au><au>Judd, Nick</au><au>Juefei-Xu, Felix</au><au>Khomh, Foutse</au><au>Kailkhura, Bhavya</au><au>Kirk, Hannah Rose</au><au>Klyman, Kevin</au><au>Knotz, Chris</au><au>Kuchnik, Michael</au><au>Kumar, Shachi H</au><au>Kumar, Srijan</au><au>Lengerich, Chris</au><au>Li, Bo</au><au>Liao, Zeyi</au><au>Long, Eileen Peters</au><au>Lu, Victor</au><au>Luger, Sarah</au><au>Mai, Yifan</au><au>Mammen, Priyanka Mary</au><au>Manyeki, Kelvin</au><au>McGregor, Sean</au><au>Mehta, Virendra</au><au>Mohammed, Shafee</au><au>Moss, Emanuel</au><au>Nachman, Lama</au><au>Naganna, Dinesh Jinenhally</au><au>Nikanjam, Amin</au><au>Nushi, Besmira</au><au>Oala, Luis</au><au>Orr, Iftach</au><au>Parrish, Alicia</au><au>Patlak, Cigdem</au><au>Pietri, William</au><au>Poursabzi-Sangdeh, Forough</au><au>Presani, Eleonora</au><au>Puletti, Fabrizio</au><au>Röttger, Paul</au><au>Sahay, Saurav</au><au>Santos, Tim</au><au>Scherrer, Nino</au><au>Sebag, Alice Schoenauer</au><au>Schramowski, Patrick</au><au>Shahbazi, Abolfazl</au><au>Sharma, Vin</au><au>Shen, Xudong</au><au>Sistla, Vamsi</au><au>Tang, Leonard</au><au>Testuggine, Davide</au><au>Thangarasa, Vithursan</au><au>Watkins, Elizabeth Anne</au><au>Weiss, Rebecca</au><au>Welty, Chris</au><au>Wilbers, Tyler</au><au>Williams, Adina</au><au>Wu, Carole-Jean</au><au>Yadav, Poonam</au><au>Yang, Xianjun</au><au>Zeng, Yi</au><au>Zhang, Wenhui</au><au>Zhdanov, Fedor</au><au>Zhu, Jiacheng</au><au>Liang, Percy</au><au>Mattson, Peter</au><au>Vanschoren, Joaquin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Introducing v0.5 of the AI Safety Benchmark from MLCommons</atitle><date>2024-04-18</date><risdate>2024</risdate><abstract>This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.</abstract><doi>10.48550/arxiv.2404.12241</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.12241
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_12241
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title Introducing v0.5 of the AI Safety Benchmark from MLCommons
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T16%3A53%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Introducing%20v0.5%20of%20the%20AI%20Safety%20Benchmark%20from%20MLCommons&rft.au=Vidgen,%20Bertie&rft.date=2024-04-18&rft_id=info:doi/10.48550/arxiv.2404.12241&rft_dat=%3Carxiv_GOX%3E2404_12241%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true