Safety case template for frontier AI: A cyber inability argument
Frontier artificial intelligence (AI) systems pose increasing risks to society, making it essential for developers to provide assurances about their safety. One approach to offering such assurances is through a safety case: a structured, evidence-based argument aimed at demonstrating why the risk as...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Frontier artificial intelligence (AI) systems pose increasing risks to
society, making it essential for developers to provide assurances about their
safety. One approach to offering such assurances is through a safety case: a
structured, evidence-based argument aimed at demonstrating why the risk
associated with a safety-critical system is acceptable. In this article, we
propose a safety case template for offensive cyber capabilities. We illustrate
how developers could argue that a model does not have capabilities posing
unacceptable cyber risks by breaking down the main claim into progressively
specific sub-claims, each supported by evidence. In our template, we identify a
number of risk models, derive proxy tasks from the risk models, define
evaluation settings for the proxy tasks, and connect those with evaluation
results. Elements of current frontier safety techniques - such as risk models,
proxy tasks, and capability evaluations - use implicit arguments for overall
system safety. This safety case template integrates these elements using the
Claims Arguments Evidence (CAE) framework in order to make safety arguments
coherent and explicit. While uncertainties around the specifics remain, this
template serves as a proof of concept, aiming to foster discussion on AI safety
cases and advance AI assurance. |
---|---|
DOI: | 10.48550/arxiv.2411.08088 |