Safety case template for frontier AI: A cyber inability argument

Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk as...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Goemans, Arthur, Buhl, Marie Davidsen, Schuett, Jonas, Korbak, Tomek, Wang, Jessica, Hilton, Benjamin, Irving, Geoffrey
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computers and Society Computer Science - Cryptography and Security
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Goemans, Arthur Buhl, Marie Davidsen Schuett, Jonas Korbak, Tomek Wang, Jessica Hilton, Benjamin Irving, Geoffrey
description	Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk associated with a safety-critical system is acceptable. In this article, we propose a safety case template for offensive cyber capabilities. We illustrate how developers could argue that a model does not have capabilities posing unacceptable cyber risks by breaking down the main claim into progressively specific sub-claims, each supported by evidence. In our template, we identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results. Elements of current frontier safety techniques - such as risk models, proxy tasks, and capability evaluations - use implicit arguments for overall system safety. This safety case template integrates these elements using the Claims Arguments Evidence (CAE) framework in order to make safety arguments coherent and explicit. While uncertainties around the specifics remain, this template serves as a proof of concept, aiming to foster discussion on AI safety cases and advance AI assurance.
doi_str_mv	10.48550/arxiv.2411.08088
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_08088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_08088</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_080883</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DOwMLCw4GRwCE5MSy2pVEhOLE5VKEnNLchJLElVSMsvUkgrys8ryUwtUnD0tFJwVEiuTAKyM_MSkzJzMoEaEovSS3NT80p4GFjTEnOKU3mhNDeDvJtriLOHLtiu-IKizNzEosp4kJ3xYDuNCasAAL5BNko</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Safety case template for frontier AI: A cyber inability argument</title><source>arXiv.org</source><creator>Goemans, Arthur ; Buhl, Marie Davidsen ; Schuett, Jonas ; Korbak, Tomek ; Wang, Jessica ; Hilton, Benjamin ; Irving, Geoffrey</creator><creatorcontrib>Goemans, Arthur ; Buhl, Marie Davidsen ; Schuett, Jonas ; Korbak, Tomek ; Wang, Jessica ; Hilton, Benjamin ; Irving, Geoffrey</creatorcontrib><description>Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk associated with a safety-critical system is acceptable. In this article, we propose a safety case template for offensive cyber capabilities. We illustrate how developers could argue that a model does not have capabilities posing unacceptable cyber risks by breaking down the main claim into progressively specific sub-claims, each supported by evidence. In our template, we identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results. Elements of current frontier safety techniques - such as risk models, proxy tasks, and capability evaluations - use implicit arguments for overall system safety. This safety case template integrates these elements using the Claims Arguments Evidence (CAE) framework in order to make safety arguments coherent and explicit. While uncertainties around the specifics remain, this template serves as a proof of concept, aiming to foster discussion on AI safety cases and advance AI assurance.</description><identifier>DOI: 10.48550/arxiv.2411.08088</identifier><language>eng</language><subject>Computer Science - Computers and Society ; Computer Science - Cryptography and Security</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.08088$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.08088$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Goemans, Arthur</creatorcontrib><creatorcontrib>Buhl, Marie Davidsen</creatorcontrib><creatorcontrib>Schuett, Jonas</creatorcontrib><creatorcontrib>Korbak, Tomek</creatorcontrib><creatorcontrib>Wang, Jessica</creatorcontrib><creatorcontrib>Hilton, Benjamin</creatorcontrib><creatorcontrib>Irving, Geoffrey</creatorcontrib><title>Safety case template for frontier AI: A cyber inability argument</title><description>Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk associated with a safety-critical system is acceptable. In this article, we propose a safety case template for offensive cyber capabilities. We illustrate how developers could argue that a model does not have capabilities posing unacceptable cyber risks by breaking down the main claim into progressively specific sub-claims, each supported by evidence. In our template, we identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results. Elements of current frontier safety techniques - such as risk models, proxy tasks, and capability evaluations - use implicit arguments for overall system safety. This safety case template integrates these elements using the Claims Arguments Evidence (CAE) framework in order to make safety arguments coherent and explicit. While uncertainties around the specifics remain, this template serves as a proof of concept, aiming to foster discussion on AI safety cases and advance AI assurance.</description><subject>Computer Science - Computers and Society</subject><subject>Computer Science - Cryptography and Security</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DOwMLCw4GRwCE5MSy2pVEhOLE5VKEnNLchJLElVSMsvUkgrys8ryUwtUnD0tFJwVEiuTAKyM_MSkzJzMoEaEovSS3NT80p4GFjTEnOKU3mhNDeDvJtriLOHLtiu-IKizNzEosp4kJ3xYDuNCasAAL5BNko</recordid><startdate>20241112</startdate><enddate>20241112</enddate><creator>Goemans, Arthur</creator><creator>Buhl, Marie Davidsen</creator><creator>Schuett, Jonas</creator><creator>Korbak, Tomek</creator><creator>Wang, Jessica</creator><creator>Hilton, Benjamin</creator><creator>Irving, Geoffrey</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241112</creationdate><title>Safety case template for frontier AI: A cyber inability argument</title><author>Goemans, Arthur ; Buhl, Marie Davidsen ; Schuett, Jonas ; Korbak, Tomek ; Wang, Jessica ; Hilton, Benjamin ; Irving, Geoffrey</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_080883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computers and Society</topic><topic>Computer Science - Cryptography and Security</topic><toplevel>online_resources</toplevel><creatorcontrib>Goemans, Arthur</creatorcontrib><creatorcontrib>Buhl, Marie Davidsen</creatorcontrib><creatorcontrib>Schuett, Jonas</creatorcontrib><creatorcontrib>Korbak, Tomek</creatorcontrib><creatorcontrib>Wang, Jessica</creatorcontrib><creatorcontrib>Hilton, Benjamin</creatorcontrib><creatorcontrib>Irving, Geoffrey</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Goemans, Arthur</au><au>Buhl, Marie Davidsen</au><au>Schuett, Jonas</au><au>Korbak, Tomek</au><au>Wang, Jessica</au><au>Hilton, Benjamin</au><au>Irving, Geoffrey</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Safety case template for frontier AI: A cyber inability argument</atitle><date>2024-11-12</date><risdate>2024</risdate><abstract>Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk associated with a safety-critical system is acceptable. In this article, we propose a safety case template for offensive cyber capabilities. We illustrate how developers could argue that a model does not have capabilities posing unacceptable cyber risks by breaking down the main claim into progressively specific sub-claims, each supported by evidence. In our template, we identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results. Elements of current frontier safety techniques - such as risk models, proxy tasks, and capability evaluations - use implicit arguments for overall system safety. This safety case template integrates these elements using the Claims Arguments Evidence (CAE) framework in order to make safety arguments coherent and explicit. While uncertainties around the specifics remain, this template serves as a proof of concept, aiming to foster discussion on AI safety cases and advance AI assurance.</abstract><doi>10.48550/arxiv.2411.08088</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.08088
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_08088
source	arXiv.org
subjects	Computer Science - Computers and Society Computer Science - Cryptography and Security
title	Safety case template for frontier AI: A cyber inability argument
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T23%3A50%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Safety%20case%20template%20for%20frontier%20AI:%20A%20cyber%20inability%20argument&rft.au=Goemans,%20Arthur&rft.date=2024-11-12&rft_id=info:doi/10.48550/arxiv.2411.08088&rft_dat=%3Carxiv_GOX%3E2411_08088%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true