Introducing v0.5 of the AI Safety Benchmark from MLCommons
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constru...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Vidgen, Bertie Agrawal, Adarsh Ahmed, Ahmed M Akinwande, Victor Al-Nuaimi, Namir Alfaraj, Najla Alhajjar, Elie Aroyo, Lora Bavalatti, Trupti Bartolo, Max Blili-Hamelin, Borhane Bollacker, Kurt Bomassani, Rishi Boston, Marisa Ferrara Campos, Siméon Chakra, Kal Chen, Canyu Coleman, Cody Coudert, Zacharie Delpierre Derczynski, Leon Dutta, Debojyoti Eisenberg, Ian Ezick, James Frase, Heather Fuller, Brian Gandikota, Ram Gangavarapu, Agasthya Gangavarapu, Ananya Gealy, James Ghosh, Rajat Goel, James Gohar, Usman Goswami, Sujata Hale, Scott A Hutiri, Wiebke Imperial, Joseph Marvin Jandial, Surgan Judd, Nick Juefei-Xu, Felix Khomh, Foutse Kailkhura, Bhavya Kirk, Hannah Rose Klyman, Kevin Knotz, Chris Kuchnik, Michael Kumar, Shachi H Kumar, Srijan Lengerich, Chris Li, Bo Liao, Zeyi Long, Eileen Peters Lu, Victor Luger, Sarah Mai, Yifan Mammen, Priyanka Mary Manyeki, Kelvin McGregor, Sean Mehta, Virendra Mohammed, Shafee Moss, Emanuel Nachman, Lama Naganna, Dinesh Jinenhally Nikanjam, Amin Nushi, Besmira Oala, Luis Orr, Iftach Parrish, Alicia Patlak, Cigdem Pietri, William Poursabzi-Sangdeh, Forough Presani, Eleonora Puletti, Fabrizio Röttger, Paul Sahay, Saurav Santos, Tim Scherrer, Nino Sebag, Alice Schoenauer Schramowski, Patrick Shahbazi, Abolfazl Sharma, Vin Shen, Xudong Sistla, Vamsi Tang, Leonard Testuggine, Davide Thangarasa, Vithursan Watkins, Elizabeth Anne Weiss, Rebecca Welty, Chris Wilbers, Tyler Williams, Adina Wu, Carole-Jean Yadav, Poonam Yang, Xianjun Zeng, Yi Zhang, Wenhui Zhdanov, Fedor Zhu, Jiacheng Liang, Percy Mattson, Peter Vanschoren, Joaquin |
description | This paper introduces v0.5 of the AI Safety Benchmark, which has been created
by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been
designed to assess the safety risks of AI systems that use chat-tuned language
models. We introduce a principled approach to specifying and constructing the
benchmark, which for v0.5 covers only a single use case (an adult chatting to a
general-purpose assistant in English), and a limited set of personas (i.e.,
typical users, malicious users, and vulnerable users). We created a new
taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark.
We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024.
The v1.0 benchmark will provide meaningful insights into the safety of AI
systems. However, the v0.5 benchmark should not be used to assess the safety of
AI systems. We have sought to fully document the limitations, flaws, and
challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes
(1) a principled approach to specifying and constructing the benchmark, which
comprises use cases, types of systems under test (SUTs), language and context,
personas, tests, and test items; (2) a taxonomy of 13 hazard categories with
definitions and subcategories; (3) tests for seven of the hazard categories,
each comprising a unique set of test items, i.e., prompts. There are 43,090
test items in total, which we created with templates; (4) a grading system for
AI systems against the benchmark; (5) an openly available platform, and
downloadable tool, called ModelBench that can be used to evaluate the safety of
AI systems on the benchmark; (6) an example evaluation report which benchmarks
the performance of over a dozen openly available chat-tuned language models;
(7) a test specification for the benchmark. |
doi_str_mv | 10.48550/arxiv.2404.12241 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_12241</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_12241</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-8f65a8fe6e7c3df5ce51520cfb710737477d0a547b5eb20cffc17ca2206ab3d23</originalsourceid><addsrcrecordid>eNotj7tOw0AQRbehQIEPoGJ_wGZf47HoEouHJSMK0lvj9Q6xEnujjYnI30MC1ZVOcXWOEHda5a4EUA-UvodjbpxyuTbG6WvxWE9ziv2XH6ZPeVQ5yMhy3gS5rOUHcZhPchUmvxkpbSWnOMq3porjGKfDjbhi2h3C7f8uxPr5aV29Zs37S10tm4wK1FnJBVDJoQjobc_gA2gwynOHWqFFh9grAocdhO7M2Wv0ZIwqqLO9sQtx_3d7kW_3afh1ObXniPYSYX8AVS1Aew</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><source>arXiv.org</source><creator>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</creator><creatorcontrib>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</creatorcontrib><description>This paper introduces v0.5 of the AI Safety Benchmark, which has been created
by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been
designed to assess the safety risks of AI systems that use chat-tuned language
models. We introduce a principled approach to specifying and constructing the
benchmark, which for v0.5 covers only a single use case (an adult chatting to a
general-purpose assistant in English), and a limited set of personas (i.e.,
typical users, malicious users, and vulnerable users). We created a new
taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark.
We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024.
The v1.0 benchmark will provide meaningful insights into the safety of AI
systems. However, the v0.5 benchmark should not be used to assess the safety of
AI systems. We have sought to fully document the limitations, flaws, and
challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes
(1) a principled approach to specifying and constructing the benchmark, which
comprises use cases, types of systems under test (SUTs), language and context,
personas, tests, and test items; (2) a taxonomy of 13 hazard categories with
definitions and subcategories; (3) tests for seven of the hazard categories,
each comprising a unique set of test items, i.e., prompts. There are 43,090
test items in total, which we created with templates; (4) a grading system for
AI systems against the benchmark; (5) an openly available platform, and
downloadable tool, called ModelBench that can be used to evaluate the safety of
AI systems on the benchmark; (6) an example evaluation report which benchmarks
the performance of over a dozen openly available chat-tuned language models;
(7) a test specification for the benchmark.</description><identifier>DOI: 10.48550/arxiv.2404.12241</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.12241$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.12241$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Vidgen, Bertie</creatorcontrib><creatorcontrib>Agrawal, Adarsh</creatorcontrib><creatorcontrib>Ahmed, Ahmed M</creatorcontrib><creatorcontrib>Akinwande, Victor</creatorcontrib><creatorcontrib>Al-Nuaimi, Namir</creatorcontrib><creatorcontrib>Alfaraj, Najla</creatorcontrib><creatorcontrib>Alhajjar, Elie</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Bavalatti, Trupti</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Blili-Hamelin, Borhane</creatorcontrib><creatorcontrib>Bollacker, Kurt</creatorcontrib><creatorcontrib>Bomassani, Rishi</creatorcontrib><creatorcontrib>Boston, Marisa Ferrara</creatorcontrib><creatorcontrib>Campos, Siméon</creatorcontrib><creatorcontrib>Chakra, Kal</creatorcontrib><creatorcontrib>Chen, Canyu</creatorcontrib><creatorcontrib>Coleman, Cody</creatorcontrib><creatorcontrib>Coudert, Zacharie Delpierre</creatorcontrib><creatorcontrib>Derczynski, Leon</creatorcontrib><creatorcontrib>Dutta, Debojyoti</creatorcontrib><creatorcontrib>Eisenberg, Ian</creatorcontrib><creatorcontrib>Ezick, James</creatorcontrib><creatorcontrib>Frase, Heather</creatorcontrib><creatorcontrib>Fuller, Brian</creatorcontrib><creatorcontrib>Gandikota, Ram</creatorcontrib><creatorcontrib>Gangavarapu, Agasthya</creatorcontrib><creatorcontrib>Gangavarapu, Ananya</creatorcontrib><creatorcontrib>Gealy, James</creatorcontrib><creatorcontrib>Ghosh, Rajat</creatorcontrib><creatorcontrib>Goel, James</creatorcontrib><creatorcontrib>Gohar, Usman</creatorcontrib><creatorcontrib>Goswami, Sujata</creatorcontrib><creatorcontrib>Hale, Scott A</creatorcontrib><creatorcontrib>Hutiri, Wiebke</creatorcontrib><creatorcontrib>Imperial, Joseph Marvin</creatorcontrib><creatorcontrib>Jandial, Surgan</creatorcontrib><creatorcontrib>Judd, Nick</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Khomh, Foutse</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Kirk, Hannah Rose</creatorcontrib><creatorcontrib>Klyman, Kevin</creatorcontrib><creatorcontrib>Knotz, Chris</creatorcontrib><creatorcontrib>Kuchnik, Michael</creatorcontrib><creatorcontrib>Kumar, Shachi H</creatorcontrib><creatorcontrib>Kumar, Srijan</creatorcontrib><creatorcontrib>Lengerich, Chris</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Liao, Zeyi</creatorcontrib><creatorcontrib>Long, Eileen Peters</creatorcontrib><creatorcontrib>Lu, Victor</creatorcontrib><creatorcontrib>Luger, Sarah</creatorcontrib><creatorcontrib>Mai, Yifan</creatorcontrib><creatorcontrib>Mammen, Priyanka Mary</creatorcontrib><creatorcontrib>Manyeki, Kelvin</creatorcontrib><creatorcontrib>McGregor, Sean</creatorcontrib><creatorcontrib>Mehta, Virendra</creatorcontrib><creatorcontrib>Mohammed, Shafee</creatorcontrib><creatorcontrib>Moss, Emanuel</creatorcontrib><creatorcontrib>Nachman, Lama</creatorcontrib><creatorcontrib>Naganna, Dinesh Jinenhally</creatorcontrib><creatorcontrib>Nikanjam, Amin</creatorcontrib><creatorcontrib>Nushi, Besmira</creatorcontrib><creatorcontrib>Oala, Luis</creatorcontrib><creatorcontrib>Orr, Iftach</creatorcontrib><creatorcontrib>Parrish, Alicia</creatorcontrib><creatorcontrib>Patlak, Cigdem</creatorcontrib><creatorcontrib>Pietri, William</creatorcontrib><creatorcontrib>Poursabzi-Sangdeh, Forough</creatorcontrib><creatorcontrib>Presani, Eleonora</creatorcontrib><creatorcontrib>Puletti, Fabrizio</creatorcontrib><creatorcontrib>Röttger, Paul</creatorcontrib><creatorcontrib>Sahay, Saurav</creatorcontrib><creatorcontrib>Santos, Tim</creatorcontrib><creatorcontrib>Scherrer, Nino</creatorcontrib><creatorcontrib>Sebag, Alice Schoenauer</creatorcontrib><creatorcontrib>Schramowski, Patrick</creatorcontrib><creatorcontrib>Shahbazi, Abolfazl</creatorcontrib><creatorcontrib>Sharma, Vin</creatorcontrib><creatorcontrib>Shen, Xudong</creatorcontrib><creatorcontrib>Sistla, Vamsi</creatorcontrib><creatorcontrib>Tang, Leonard</creatorcontrib><creatorcontrib>Testuggine, Davide</creatorcontrib><creatorcontrib>Thangarasa, Vithursan</creatorcontrib><creatorcontrib>Watkins, Elizabeth Anne</creatorcontrib><creatorcontrib>Weiss, Rebecca</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><creatorcontrib>Wilbers, Tyler</creatorcontrib><creatorcontrib>Williams, Adina</creatorcontrib><creatorcontrib>Wu, Carole-Jean</creatorcontrib><creatorcontrib>Yadav, Poonam</creatorcontrib><creatorcontrib>Yang, Xianjun</creatorcontrib><creatorcontrib>Zeng, Yi</creatorcontrib><creatorcontrib>Zhang, Wenhui</creatorcontrib><creatorcontrib>Zhdanov, Fedor</creatorcontrib><creatorcontrib>Zhu, Jiacheng</creatorcontrib><creatorcontrib>Liang, Percy</creatorcontrib><creatorcontrib>Mattson, Peter</creatorcontrib><creatorcontrib>Vanschoren, Joaquin</creatorcontrib><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><description>This paper introduces v0.5 of the AI Safety Benchmark, which has been created
by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been
designed to assess the safety risks of AI systems that use chat-tuned language
models. We introduce a principled approach to specifying and constructing the
benchmark, which for v0.5 covers only a single use case (an adult chatting to a
general-purpose assistant in English), and a limited set of personas (i.e.,
typical users, malicious users, and vulnerable users). We created a new
taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark.
We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024.
The v1.0 benchmark will provide meaningful insights into the safety of AI
systems. However, the v0.5 benchmark should not be used to assess the safety of
AI systems. We have sought to fully document the limitations, flaws, and
challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes
(1) a principled approach to specifying and constructing the benchmark, which
comprises use cases, types of systems under test (SUTs), language and context,
personas, tests, and test items; (2) a taxonomy of 13 hazard categories with
definitions and subcategories; (3) tests for seven of the hazard categories,
each comprising a unique set of test items, i.e., prompts. There are 43,090
test items in total, which we created with templates; (4) a grading system for
AI systems against the benchmark; (5) an openly available platform, and
downloadable tool, called ModelBench that can be used to evaluate the safety of
AI systems on the benchmark; (6) an example evaluation report which benchmarks
the performance of over a dozen openly available chat-tuned language models;
(7) a test specification for the benchmark.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOw0AQRbehQIEPoGJ_wGZf47HoEouHJSMK0lvj9Q6xEnujjYnI30MC1ZVOcXWOEHda5a4EUA-UvodjbpxyuTbG6WvxWE9ziv2XH6ZPeVQ5yMhy3gS5rOUHcZhPchUmvxkpbSWnOMq3porjGKfDjbhi2h3C7f8uxPr5aV29Zs37S10tm4wK1FnJBVDJoQjobc_gA2gwynOHWqFFh9grAocdhO7M2Wv0ZIwqqLO9sQtx_3d7kW_3afh1ObXniPYSYX8AVS1Aew</recordid><startdate>20240418</startdate><enddate>20240418</enddate><creator>Vidgen, Bertie</creator><creator>Agrawal, Adarsh</creator><creator>Ahmed, Ahmed M</creator><creator>Akinwande, Victor</creator><creator>Al-Nuaimi, Namir</creator><creator>Alfaraj, Najla</creator><creator>Alhajjar, Elie</creator><creator>Aroyo, Lora</creator><creator>Bavalatti, Trupti</creator><creator>Bartolo, Max</creator><creator>Blili-Hamelin, Borhane</creator><creator>Bollacker, Kurt</creator><creator>Bomassani, Rishi</creator><creator>Boston, Marisa Ferrara</creator><creator>Campos, Siméon</creator><creator>Chakra, Kal</creator><creator>Chen, Canyu</creator><creator>Coleman, Cody</creator><creator>Coudert, Zacharie Delpierre</creator><creator>Derczynski, Leon</creator><creator>Dutta, Debojyoti</creator><creator>Eisenberg, Ian</creator><creator>Ezick, James</creator><creator>Frase, Heather</creator><creator>Fuller, Brian</creator><creator>Gandikota, Ram</creator><creator>Gangavarapu, Agasthya</creator><creator>Gangavarapu, Ananya</creator><creator>Gealy, James</creator><creator>Ghosh, Rajat</creator><creator>Goel, James</creator><creator>Gohar, Usman</creator><creator>Goswami, Sujata</creator><creator>Hale, Scott A</creator><creator>Hutiri, Wiebke</creator><creator>Imperial, Joseph Marvin</creator><creator>Jandial, Surgan</creator><creator>Judd, Nick</creator><creator>Juefei-Xu, Felix</creator><creator>Khomh, Foutse</creator><creator>Kailkhura, Bhavya</creator><creator>Kirk, Hannah Rose</creator><creator>Klyman, Kevin</creator><creator>Knotz, Chris</creator><creator>Kuchnik, Michael</creator><creator>Kumar, Shachi H</creator><creator>Kumar, Srijan</creator><creator>Lengerich, Chris</creator><creator>Li, Bo</creator><creator>Liao, Zeyi</creator><creator>Long, Eileen Peters</creator><creator>Lu, Victor</creator><creator>Luger, Sarah</creator><creator>Mai, Yifan</creator><creator>Mammen, Priyanka Mary</creator><creator>Manyeki, Kelvin</creator><creator>McGregor, Sean</creator><creator>Mehta, Virendra</creator><creator>Mohammed, Shafee</creator><creator>Moss, Emanuel</creator><creator>Nachman, Lama</creator><creator>Naganna, Dinesh Jinenhally</creator><creator>Nikanjam, Amin</creator><creator>Nushi, Besmira</creator><creator>Oala, Luis</creator><creator>Orr, Iftach</creator><creator>Parrish, Alicia</creator><creator>Patlak, Cigdem</creator><creator>Pietri, William</creator><creator>Poursabzi-Sangdeh, Forough</creator><creator>Presani, Eleonora</creator><creator>Puletti, Fabrizio</creator><creator>Röttger, Paul</creator><creator>Sahay, Saurav</creator><creator>Santos, Tim</creator><creator>Scherrer, Nino</creator><creator>Sebag, Alice Schoenauer</creator><creator>Schramowski, Patrick</creator><creator>Shahbazi, Abolfazl</creator><creator>Sharma, Vin</creator><creator>Shen, Xudong</creator><creator>Sistla, Vamsi</creator><creator>Tang, Leonard</creator><creator>Testuggine, Davide</creator><creator>Thangarasa, Vithursan</creator><creator>Watkins, Elizabeth Anne</creator><creator>Weiss, Rebecca</creator><creator>Welty, Chris</creator><creator>Wilbers, Tyler</creator><creator>Williams, Adina</creator><creator>Wu, Carole-Jean</creator><creator>Yadav, Poonam</creator><creator>Yang, Xianjun</creator><creator>Zeng, Yi</creator><creator>Zhang, Wenhui</creator><creator>Zhdanov, Fedor</creator><creator>Zhu, Jiacheng</creator><creator>Liang, Percy</creator><creator>Mattson, Peter</creator><creator>Vanschoren, Joaquin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240418</creationdate><title>Introducing v0.5 of the AI Safety Benchmark from MLCommons</title><author>Vidgen, Bertie ; Agrawal, Adarsh ; Ahmed, Ahmed M ; Akinwande, Victor ; Al-Nuaimi, Namir ; Alfaraj, Najla ; Alhajjar, Elie ; Aroyo, Lora ; Bavalatti, Trupti ; Bartolo, Max ; Blili-Hamelin, Borhane ; Bollacker, Kurt ; Bomassani, Rishi ; Boston, Marisa Ferrara ; Campos, Siméon ; Chakra, Kal ; Chen, Canyu ; Coleman, Cody ; Coudert, Zacharie Delpierre ; Derczynski, Leon ; Dutta, Debojyoti ; Eisenberg, Ian ; Ezick, James ; Frase, Heather ; Fuller, Brian ; Gandikota, Ram ; Gangavarapu, Agasthya ; Gangavarapu, Ananya ; Gealy, James ; Ghosh, Rajat ; Goel, James ; Gohar, Usman ; Goswami, Sujata ; Hale, Scott A ; Hutiri, Wiebke ; Imperial, Joseph Marvin ; Jandial, Surgan ; Judd, Nick ; Juefei-Xu, Felix ; Khomh, Foutse ; Kailkhura, Bhavya ; Kirk, Hannah Rose ; Klyman, Kevin ; Knotz, Chris ; Kuchnik, Michael ; Kumar, Shachi H ; Kumar, Srijan ; Lengerich, Chris ; Li, Bo ; Liao, Zeyi ; Long, Eileen Peters ; Lu, Victor ; Luger, Sarah ; Mai, Yifan ; Mammen, Priyanka Mary ; Manyeki, Kelvin ; McGregor, Sean ; Mehta, Virendra ; Mohammed, Shafee ; Moss, Emanuel ; Nachman, Lama ; Naganna, Dinesh Jinenhally ; Nikanjam, Amin ; Nushi, Besmira ; Oala, Luis ; Orr, Iftach ; Parrish, Alicia ; Patlak, Cigdem ; Pietri, William ; Poursabzi-Sangdeh, Forough ; Presani, Eleonora ; Puletti, Fabrizio ; Röttger, Paul ; Sahay, Saurav ; Santos, Tim ; Scherrer, Nino ; Sebag, Alice Schoenauer ; Schramowski, Patrick ; Shahbazi, Abolfazl ; Sharma, Vin ; Shen, Xudong ; Sistla, Vamsi ; Tang, Leonard ; Testuggine, Davide ; Thangarasa, Vithursan ; Watkins, Elizabeth Anne ; Weiss, Rebecca ; Welty, Chris ; Wilbers, Tyler ; Williams, Adina ; Wu, Carole-Jean ; Yadav, Poonam ; Yang, Xianjun ; Zeng, Yi ; Zhang, Wenhui ; Zhdanov, Fedor ; Zhu, Jiacheng ; Liang, Percy ; Mattson, Peter ; Vanschoren, Joaquin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-8f65a8fe6e7c3df5ce51520cfb710737477d0a547b5eb20cffc17ca2206ab3d23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Vidgen, Bertie</creatorcontrib><creatorcontrib>Agrawal, Adarsh</creatorcontrib><creatorcontrib>Ahmed, Ahmed M</creatorcontrib><creatorcontrib>Akinwande, Victor</creatorcontrib><creatorcontrib>Al-Nuaimi, Namir</creatorcontrib><creatorcontrib>Alfaraj, Najla</creatorcontrib><creatorcontrib>Alhajjar, Elie</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Bavalatti, Trupti</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Blili-Hamelin, Borhane</creatorcontrib><creatorcontrib>Bollacker, Kurt</creatorcontrib><creatorcontrib>Bomassani, Rishi</creatorcontrib><creatorcontrib>Boston, Marisa Ferrara</creatorcontrib><creatorcontrib>Campos, Siméon</creatorcontrib><creatorcontrib>Chakra, Kal</creatorcontrib><creatorcontrib>Chen, Canyu</creatorcontrib><creatorcontrib>Coleman, Cody</creatorcontrib><creatorcontrib>Coudert, Zacharie Delpierre</creatorcontrib><creatorcontrib>Derczynski, Leon</creatorcontrib><creatorcontrib>Dutta, Debojyoti</creatorcontrib><creatorcontrib>Eisenberg, Ian</creatorcontrib><creatorcontrib>Ezick, James</creatorcontrib><creatorcontrib>Frase, Heather</creatorcontrib><creatorcontrib>Fuller, Brian</creatorcontrib><creatorcontrib>Gandikota, Ram</creatorcontrib><creatorcontrib>Gangavarapu, Agasthya</creatorcontrib><creatorcontrib>Gangavarapu, Ananya</creatorcontrib><creatorcontrib>Gealy, James</creatorcontrib><creatorcontrib>Ghosh, Rajat</creatorcontrib><creatorcontrib>Goel, James</creatorcontrib><creatorcontrib>Gohar, Usman</creatorcontrib><creatorcontrib>Goswami, Sujata</creatorcontrib><creatorcontrib>Hale, Scott A</creatorcontrib><creatorcontrib>Hutiri, Wiebke</creatorcontrib><creatorcontrib>Imperial, Joseph Marvin</creatorcontrib><creatorcontrib>Jandial, Surgan</creatorcontrib><creatorcontrib>Judd, Nick</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Khomh, Foutse</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Kirk, Hannah Rose</creatorcontrib><creatorcontrib>Klyman, Kevin</creatorcontrib><creatorcontrib>Knotz, Chris</creatorcontrib><creatorcontrib>Kuchnik, Michael</creatorcontrib><creatorcontrib>Kumar, Shachi H</creatorcontrib><creatorcontrib>Kumar, Srijan</creatorcontrib><creatorcontrib>Lengerich, Chris</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Liao, Zeyi</creatorcontrib><creatorcontrib>Long, Eileen Peters</creatorcontrib><creatorcontrib>Lu, Victor</creatorcontrib><creatorcontrib>Luger, Sarah</creatorcontrib><creatorcontrib>Mai, Yifan</creatorcontrib><creatorcontrib>Mammen, Priyanka Mary</creatorcontrib><creatorcontrib>Manyeki, Kelvin</creatorcontrib><creatorcontrib>McGregor, Sean</creatorcontrib><creatorcontrib>Mehta, Virendra</creatorcontrib><creatorcontrib>Mohammed, Shafee</creatorcontrib><creatorcontrib>Moss, Emanuel</creatorcontrib><creatorcontrib>Nachman, Lama</creatorcontrib><creatorcontrib>Naganna, Dinesh Jinenhally</creatorcontrib><creatorcontrib>Nikanjam, Amin</creatorcontrib><creatorcontrib>Nushi, Besmira</creatorcontrib><creatorcontrib>Oala, Luis</creatorcontrib><creatorcontrib>Orr, Iftach</creatorcontrib><creatorcontrib>Parrish, Alicia</creatorcontrib><creatorcontrib>Patlak, Cigdem</creatorcontrib><creatorcontrib>Pietri, William</creatorcontrib><creatorcontrib>Poursabzi-Sangdeh, Forough</creatorcontrib><creatorcontrib>Presani, Eleonora</creatorcontrib><creatorcontrib>Puletti, Fabrizio</creatorcontrib><creatorcontrib>Röttger, Paul</creatorcontrib><creatorcontrib>Sahay, Saurav</creatorcontrib><creatorcontrib>Santos, Tim</creatorcontrib><creatorcontrib>Scherrer, Nino</creatorcontrib><creatorcontrib>Sebag, Alice Schoenauer</creatorcontrib><creatorcontrib>Schramowski, Patrick</creatorcontrib><creatorcontrib>Shahbazi, Abolfazl</creatorcontrib><creatorcontrib>Sharma, Vin</creatorcontrib><creatorcontrib>Shen, Xudong</creatorcontrib><creatorcontrib>Sistla, Vamsi</creatorcontrib><creatorcontrib>Tang, Leonard</creatorcontrib><creatorcontrib>Testuggine, Davide</creatorcontrib><creatorcontrib>Thangarasa, Vithursan</creatorcontrib><creatorcontrib>Watkins, Elizabeth Anne</creatorcontrib><creatorcontrib>Weiss, Rebecca</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><creatorcontrib>Wilbers, Tyler</creatorcontrib><creatorcontrib>Williams, Adina</creatorcontrib><creatorcontrib>Wu, Carole-Jean</creatorcontrib><creatorcontrib>Yadav, Poonam</creatorcontrib><creatorcontrib>Yang, Xianjun</creatorcontrib><creatorcontrib>Zeng, Yi</creatorcontrib><creatorcontrib>Zhang, Wenhui</creatorcontrib><creatorcontrib>Zhdanov, Fedor</creatorcontrib><creatorcontrib>Zhu, Jiacheng</creatorcontrib><creatorcontrib>Liang, Percy</creatorcontrib><creatorcontrib>Mattson, Peter</creatorcontrib><creatorcontrib>Vanschoren, Joaquin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vidgen, Bertie</au><au>Agrawal, Adarsh</au><au>Ahmed, Ahmed M</au><au>Akinwande, Victor</au><au>Al-Nuaimi, Namir</au><au>Alfaraj, Najla</au><au>Alhajjar, Elie</au><au>Aroyo, Lora</au><au>Bavalatti, Trupti</au><au>Bartolo, Max</au><au>Blili-Hamelin, Borhane</au><au>Bollacker, Kurt</au><au>Bomassani, Rishi</au><au>Boston, Marisa Ferrara</au><au>Campos, Siméon</au><au>Chakra, Kal</au><au>Chen, Canyu</au><au>Coleman, Cody</au><au>Coudert, Zacharie Delpierre</au><au>Derczynski, Leon</au><au>Dutta, Debojyoti</au><au>Eisenberg, Ian</au><au>Ezick, James</au><au>Frase, Heather</au><au>Fuller, Brian</au><au>Gandikota, Ram</au><au>Gangavarapu, Agasthya</au><au>Gangavarapu, Ananya</au><au>Gealy, James</au><au>Ghosh, Rajat</au><au>Goel, James</au><au>Gohar, Usman</au><au>Goswami, Sujata</au><au>Hale, Scott A</au><au>Hutiri, Wiebke</au><au>Imperial, Joseph Marvin</au><au>Jandial, Surgan</au><au>Judd, Nick</au><au>Juefei-Xu, Felix</au><au>Khomh, Foutse</au><au>Kailkhura, Bhavya</au><au>Kirk, Hannah Rose</au><au>Klyman, Kevin</au><au>Knotz, Chris</au><au>Kuchnik, Michael</au><au>Kumar, Shachi H</au><au>Kumar, Srijan</au><au>Lengerich, Chris</au><au>Li, Bo</au><au>Liao, Zeyi</au><au>Long, Eileen Peters</au><au>Lu, Victor</au><au>Luger, Sarah</au><au>Mai, Yifan</au><au>Mammen, Priyanka Mary</au><au>Manyeki, Kelvin</au><au>McGregor, Sean</au><au>Mehta, Virendra</au><au>Mohammed, Shafee</au><au>Moss, Emanuel</au><au>Nachman, Lama</au><au>Naganna, Dinesh Jinenhally</au><au>Nikanjam, Amin</au><au>Nushi, Besmira</au><au>Oala, Luis</au><au>Orr, Iftach</au><au>Parrish, Alicia</au><au>Patlak, Cigdem</au><au>Pietri, William</au><au>Poursabzi-Sangdeh, Forough</au><au>Presani, Eleonora</au><au>Puletti, Fabrizio</au><au>Röttger, Paul</au><au>Sahay, Saurav</au><au>Santos, Tim</au><au>Scherrer, Nino</au><au>Sebag, Alice Schoenauer</au><au>Schramowski, Patrick</au><au>Shahbazi, Abolfazl</au><au>Sharma, Vin</au><au>Shen, Xudong</au><au>Sistla, Vamsi</au><au>Tang, Leonard</au><au>Testuggine, Davide</au><au>Thangarasa, Vithursan</au><au>Watkins, Elizabeth Anne</au><au>Weiss, Rebecca</au><au>Welty, Chris</au><au>Wilbers, Tyler</au><au>Williams, Adina</au><au>Wu, Carole-Jean</au><au>Yadav, Poonam</au><au>Yang, Xianjun</au><au>Zeng, Yi</au><au>Zhang, Wenhui</au><au>Zhdanov, Fedor</au><au>Zhu, Jiacheng</au><au>Liang, Percy</au><au>Mattson, Peter</au><au>Vanschoren, Joaquin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Introducing v0.5 of the AI Safety Benchmark from MLCommons</atitle><date>2024-04-18</date><risdate>2024</risdate><abstract>This paper introduces v0.5 of the AI Safety Benchmark, which has been created
by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been
designed to assess the safety risks of AI systems that use chat-tuned language
models. We introduce a principled approach to specifying and constructing the
benchmark, which for v0.5 covers only a single use case (an adult chatting to a
general-purpose assistant in English), and a limited set of personas (i.e.,
typical users, malicious users, and vulnerable users). We created a new
taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark.
We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024.
The v1.0 benchmark will provide meaningful insights into the safety of AI
systems. However, the v0.5 benchmark should not be used to assess the safety of
AI systems. We have sought to fully document the limitations, flaws, and
challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes
(1) a principled approach to specifying and constructing the benchmark, which
comprises use cases, types of systems under test (SUTs), language and context,
personas, tests, and test items; (2) a taxonomy of 13 hazard categories with
definitions and subcategories; (3) tests for seven of the hazard categories,
each comprising a unique set of test items, i.e., prompts. There are 43,090
test items in total, which we created with templates; (4) a grading system for
AI systems against the benchmark; (5) an openly available platform, and
downloadable tool, called ModelBench that can be used to evaluate the safety of
AI systems on the benchmark; (6) an example evaluation report which benchmarks
the performance of over a dozen openly available chat-tuned language models;
(7) a test specification for the benchmark.</abstract><doi>10.48550/arxiv.2404.12241</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2404.12241 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2404_12241 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Introducing v0.5 of the AI Safety Benchmark from MLCommons |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T16%3A53%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Introducing%20v0.5%20of%20the%20AI%20Safety%20Benchmark%20from%20MLCommons&rft.au=Vidgen,%20Bertie&rft.date=2024-04-18&rft_id=info:doi/10.48550/arxiv.2404.12241&rft_dat=%3Carxiv_GOX%3E2404_12241%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |