OpenAI o1 System Card
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when res...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | OpenAI Jaech, Aaron Low, Aiden Helyar, Alec Passos, Alex Tachard Neitz, Alexander Tam, Allison Bennett, Ally Applebaum, Andy Zoph, Barret Ghorbani, Behrooz Rossen, Ben McKinzie, Brandon Lugaresi, Camillo Shen, Chen Zhang, Chong Koch, Chris Roberts, Dan Kappler, Daniel Dohan, David Farhi, David Zhang, Eddie Wallace, Eric Ritter, Erik Such, Felipe Petroski Raso, Filippo Tsimpourlas, Foivos Sulit, Freddie Parascandolo, Giambattista Chabot, Gildas Andrin, Hart Ren, Hongyu Lightman, Hunter Kivlichan, Ian Kostrikov, Ilya Sutskever, Ilya Lennon, James Harb, Jean Yu, Jiahui Tang, Jie Yu, Jieqi Parish, Joel Heidecke, Johannes Ward, Jonathan Huizinga, Joost Nguyen, Karina Shi, Katy Gu-Lemberg, Keren Lu, Kevin Yu, Kevin Ahmad, Lama Kuhn, Lorenz Kondraciuk, Lukas Boyd, Madelaine Joglekar, Manas Chen, Mark Tintor, Marko Schwarzer, Max Shah, Meghan Yatbaz, Mehmet Xu, Mengyuan Yan, Mengyuan Glaese, Mia Malek, Michael Pavlov, Mikhail Wang, Miles McAleese, Nat Chowdhury, Neil Ryder, Nick Chao, Patrick Izmailov, Pavel Arora, Rahul Lopes, Rapha Gontijo Gaon, Raz Leike, Reimar Brown, Robin Altman, Sam Agarwal, Sandhini Baker, Sasha McKinney, Scott Yan, Scottie Chaudhuri, Shraman Ray Zhang, Shuyuan Fu, Siyuan Wang, Tao Gordon, Taylor Patwardhan, Tejal Dimson, Thomas Zheng, Tianhao Stasi, Tom Bansal, Trapit Creech, Trevor Peterson, Troy Zhou, Wenda Dubois, Yann Chen, Yining Bai, Yu He, Yuchen Zhang, Yuchen Wang, Yunyun |
description | The o1 model series is trained with large-scale reinforcement learning to
reason using chain of thought. These advanced reasoning capabilities provide
new avenues for improving the safety and robustness of our models. In
particular, our models can reason about our safety policies in context when
responding to potentially unsafe prompts, through deliberative alignment. This
leads to state-of-the-art performance on certain benchmarks for risks such as
generating illicit advice, choosing stereotyped responses, and succumbing to
known jailbreaks. Training models to incorporate a chain of thought before
answering has the potential to unlock substantial benefits, while also
increasing potential risks that stem from heightened intelligence. Our results
underscore the need for building robust alignment methods, extensively
stress-testing their efficacy, and maintaining meticulous risk management
protocols. This report outlines the safety work carried out for the OpenAI o1
and OpenAI o1-mini models, including safety evaluations, external red teaming,
and Preparedness Framework evaluations. |
doi_str_mv | 10.48550/arxiv.2412.16720 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_16720</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_16720</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_167203</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0Mzcy4GQQ9S9IzXP0VMg3VAiuLC5JzVVwTixK4WFgTUvMKU7lhdLcDPJuriHOHrpgA-ILijJzE4sq40EGxYMNMiasAgCMISWv</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>OpenAI o1 System Card</title><source>arXiv.org</source><creator>OpenAI ; Jaech, Aaron ; Low, Aiden ; Helyar, Alec ; Passos, Alex Tachard ; Neitz, Alexander ; Tam, Allison ; Bennett, Ally ; Applebaum, Andy ; Zoph, Barret ; Ghorbani, Behrooz ; Rossen, Ben ; McKinzie, Brandon ; Lugaresi, Camillo ; Shen, Chen ; Zhang, Chong ; Koch, Chris ; Roberts, Dan ; Kappler, Daniel ; Dohan, David ; Farhi, David ; Zhang, Eddie ; Wallace, Eric ; Ritter, Erik ; Such, Felipe Petroski ; Raso, Filippo ; Tsimpourlas, Foivos ; Sulit, Freddie ; Parascandolo, Giambattista ; Chabot, Gildas ; Andrin, Hart ; Ren, Hongyu ; Lightman, Hunter ; Kivlichan, Ian ; Kostrikov, Ilya ; Sutskever, Ilya ; Lennon, James ; Harb, Jean ; Yu, Jiahui ; Tang, Jie ; Yu, Jieqi ; Parish, Joel ; Heidecke, Johannes ; Ward, Jonathan ; Huizinga, Joost ; Nguyen, Karina ; Shi, Katy ; Gu-Lemberg, Keren ; Lu, Kevin ; Yu, Kevin ; Ahmad, Lama ; Kuhn, Lorenz ; Kondraciuk, Lukas ; Boyd, Madelaine ; Joglekar, Manas ; Chen, Mark ; Tintor, Marko ; Schwarzer, Max ; Shah, Meghan ; Yatbaz, Mehmet ; Xu, Mengyuan ; Yan, Mengyuan ; Glaese, Mia ; Malek, Michael ; Pavlov, Mikhail ; Wang, Miles ; McAleese, Nat ; Chowdhury, Neil ; Ryder, Nick ; Chao, Patrick ; Izmailov, Pavel ; Arora, Rahul ; Lopes, Rapha Gontijo ; Gaon, Raz ; Leike, Reimar ; Brown, Robin ; Altman, Sam ; Agarwal, Sandhini ; Baker, Sasha ; McKinney, Scott ; Yan, Scottie ; Chaudhuri, Shraman Ray ; Zhang, Shuyuan ; Fu, Siyuan ; Wang, Tao ; Gordon, Taylor ; Patwardhan, Tejal ; Dimson, Thomas ; Zheng, Tianhao ; Stasi, Tom ; Bansal, Trapit ; Creech, Trevor ; Peterson, Troy ; Zhou, Wenda ; Dubois, Yann ; Chen, Yining ; Bai, Yu ; He, Yuchen ; Zhang, Yuchen ; Wang, Yunyun</creator><creatorcontrib>OpenAI ; Jaech, Aaron ; Low, Aiden ; Helyar, Alec ; Passos, Alex Tachard ; Neitz, Alexander ; Tam, Allison ; Bennett, Ally ; Applebaum, Andy ; Zoph, Barret ; Ghorbani, Behrooz ; Rossen, Ben ; McKinzie, Brandon ; Lugaresi, Camillo ; Shen, Chen ; Zhang, Chong ; Koch, Chris ; Roberts, Dan ; Kappler, Daniel ; Dohan, David ; Farhi, David ; Zhang, Eddie ; Wallace, Eric ; Ritter, Erik ; Such, Felipe Petroski ; Raso, Filippo ; Tsimpourlas, Foivos ; Sulit, Freddie ; Parascandolo, Giambattista ; Chabot, Gildas ; Andrin, Hart ; Ren, Hongyu ; Lightman, Hunter ; Kivlichan, Ian ; Kostrikov, Ilya ; Sutskever, Ilya ; Lennon, James ; Harb, Jean ; Yu, Jiahui ; Tang, Jie ; Yu, Jieqi ; Parish, Joel ; Heidecke, Johannes ; Ward, Jonathan ; Huizinga, Joost ; Nguyen, Karina ; Shi, Katy ; Gu-Lemberg, Keren ; Lu, Kevin ; Yu, Kevin ; Ahmad, Lama ; Kuhn, Lorenz ; Kondraciuk, Lukas ; Boyd, Madelaine ; Joglekar, Manas ; Chen, Mark ; Tintor, Marko ; Schwarzer, Max ; Shah, Meghan ; Yatbaz, Mehmet ; Xu, Mengyuan ; Yan, Mengyuan ; Glaese, Mia ; Malek, Michael ; Pavlov, Mikhail ; Wang, Miles ; McAleese, Nat ; Chowdhury, Neil ; Ryder, Nick ; Chao, Patrick ; Izmailov, Pavel ; Arora, Rahul ; Lopes, Rapha Gontijo ; Gaon, Raz ; Leike, Reimar ; Brown, Robin ; Altman, Sam ; Agarwal, Sandhini ; Baker, Sasha ; McKinney, Scott ; Yan, Scottie ; Chaudhuri, Shraman Ray ; Zhang, Shuyuan ; Fu, Siyuan ; Wang, Tao ; Gordon, Taylor ; Patwardhan, Tejal ; Dimson, Thomas ; Zheng, Tianhao ; Stasi, Tom ; Bansal, Trapit ; Creech, Trevor ; Peterson, Troy ; Zhou, Wenda ; Dubois, Yann ; Chen, Yining ; Bai, Yu ; He, Yuchen ; Zhang, Yuchen ; Wang, Yunyun</creatorcontrib><description>The o1 model series is trained with large-scale reinforcement learning to
reason using chain of thought. These advanced reasoning capabilities provide
new avenues for improving the safety and robustness of our models. In
particular, our models can reason about our safety policies in context when
responding to potentially unsafe prompts, through deliberative alignment. This
leads to state-of-the-art performance on certain benchmarks for risks such as
generating illicit advice, choosing stereotyped responses, and succumbing to
known jailbreaks. Training models to incorporate a chain of thought before
answering has the potential to unlock substantial benefits, while also
increasing potential risks that stem from heightened intelligence. Our results
underscore the need for building robust alignment methods, extensively
stress-testing their efficacy, and maintaining meticulous risk management
protocols. This report outlines the safety work carried out for the OpenAI o1
and OpenAI o1-mini models, including safety evaluations, external red teaming,
and Preparedness Framework evaluations.</description><identifier>DOI: 10.48550/arxiv.2412.16720</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.16720$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.16720$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>OpenAI</creatorcontrib><creatorcontrib>Jaech, Aaron</creatorcontrib><creatorcontrib>Low, Aiden</creatorcontrib><creatorcontrib>Helyar, Alec</creatorcontrib><creatorcontrib>Passos, Alex Tachard</creatorcontrib><creatorcontrib>Neitz, Alexander</creatorcontrib><creatorcontrib>Tam, Allison</creatorcontrib><creatorcontrib>Bennett, Ally</creatorcontrib><creatorcontrib>Applebaum, Andy</creatorcontrib><creatorcontrib>Zoph, Barret</creatorcontrib><creatorcontrib>Ghorbani, Behrooz</creatorcontrib><creatorcontrib>Rossen, Ben</creatorcontrib><creatorcontrib>McKinzie, Brandon</creatorcontrib><creatorcontrib>Lugaresi, Camillo</creatorcontrib><creatorcontrib>Shen, Chen</creatorcontrib><creatorcontrib>Zhang, Chong</creatorcontrib><creatorcontrib>Koch, Chris</creatorcontrib><creatorcontrib>Roberts, Dan</creatorcontrib><creatorcontrib>Kappler, Daniel</creatorcontrib><creatorcontrib>Dohan, David</creatorcontrib><creatorcontrib>Farhi, David</creatorcontrib><creatorcontrib>Zhang, Eddie</creatorcontrib><creatorcontrib>Wallace, Eric</creatorcontrib><creatorcontrib>Ritter, Erik</creatorcontrib><creatorcontrib>Such, Felipe Petroski</creatorcontrib><creatorcontrib>Raso, Filippo</creatorcontrib><creatorcontrib>Tsimpourlas, Foivos</creatorcontrib><creatorcontrib>Sulit, Freddie</creatorcontrib><creatorcontrib>Parascandolo, Giambattista</creatorcontrib><creatorcontrib>Chabot, Gildas</creatorcontrib><creatorcontrib>Andrin, Hart</creatorcontrib><creatorcontrib>Ren, Hongyu</creatorcontrib><creatorcontrib>Lightman, Hunter</creatorcontrib><creatorcontrib>Kivlichan, Ian</creatorcontrib><creatorcontrib>Kostrikov, Ilya</creatorcontrib><creatorcontrib>Sutskever, Ilya</creatorcontrib><creatorcontrib>Lennon, James</creatorcontrib><creatorcontrib>Harb, Jean</creatorcontrib><creatorcontrib>Yu, Jiahui</creatorcontrib><creatorcontrib>Tang, Jie</creatorcontrib><creatorcontrib>Yu, Jieqi</creatorcontrib><creatorcontrib>Parish, Joel</creatorcontrib><creatorcontrib>Heidecke, Johannes</creatorcontrib><creatorcontrib>Ward, Jonathan</creatorcontrib><creatorcontrib>Huizinga, Joost</creatorcontrib><creatorcontrib>Nguyen, Karina</creatorcontrib><creatorcontrib>Shi, Katy</creatorcontrib><creatorcontrib>Gu-Lemberg, Keren</creatorcontrib><creatorcontrib>Lu, Kevin</creatorcontrib><creatorcontrib>Yu, Kevin</creatorcontrib><creatorcontrib>Ahmad, Lama</creatorcontrib><creatorcontrib>Kuhn, Lorenz</creatorcontrib><creatorcontrib>Kondraciuk, Lukas</creatorcontrib><creatorcontrib>Boyd, Madelaine</creatorcontrib><creatorcontrib>Joglekar, Manas</creatorcontrib><creatorcontrib>Chen, Mark</creatorcontrib><creatorcontrib>Tintor, Marko</creatorcontrib><creatorcontrib>Schwarzer, Max</creatorcontrib><creatorcontrib>Shah, Meghan</creatorcontrib><creatorcontrib>Yatbaz, Mehmet</creatorcontrib><creatorcontrib>Xu, Mengyuan</creatorcontrib><creatorcontrib>Yan, Mengyuan</creatorcontrib><creatorcontrib>Glaese, Mia</creatorcontrib><creatorcontrib>Malek, Michael</creatorcontrib><creatorcontrib>Pavlov, Mikhail</creatorcontrib><creatorcontrib>Wang, Miles</creatorcontrib><creatorcontrib>McAleese, Nat</creatorcontrib><creatorcontrib>Chowdhury, Neil</creatorcontrib><creatorcontrib>Ryder, Nick</creatorcontrib><creatorcontrib>Chao, Patrick</creatorcontrib><creatorcontrib>Izmailov, Pavel</creatorcontrib><creatorcontrib>Arora, Rahul</creatorcontrib><creatorcontrib>Lopes, Rapha Gontijo</creatorcontrib><creatorcontrib>Gaon, Raz</creatorcontrib><creatorcontrib>Leike, Reimar</creatorcontrib><creatorcontrib>Brown, Robin</creatorcontrib><creatorcontrib>Altman, Sam</creatorcontrib><creatorcontrib>Agarwal, Sandhini</creatorcontrib><creatorcontrib>Baker, Sasha</creatorcontrib><creatorcontrib>McKinney, Scott</creatorcontrib><creatorcontrib>Yan, Scottie</creatorcontrib><creatorcontrib>Chaudhuri, Shraman Ray</creatorcontrib><creatorcontrib>Zhang, Shuyuan</creatorcontrib><creatorcontrib>Fu, Siyuan</creatorcontrib><creatorcontrib>Wang, Tao</creatorcontrib><creatorcontrib>Gordon, Taylor</creatorcontrib><creatorcontrib>Patwardhan, Tejal</creatorcontrib><creatorcontrib>Dimson, Thomas</creatorcontrib><creatorcontrib>Zheng, Tianhao</creatorcontrib><creatorcontrib>Stasi, Tom</creatorcontrib><creatorcontrib>Bansal, Trapit</creatorcontrib><creatorcontrib>Creech, Trevor</creatorcontrib><creatorcontrib>Peterson, Troy</creatorcontrib><creatorcontrib>Zhou, Wenda</creatorcontrib><creatorcontrib>Dubois, Yann</creatorcontrib><creatorcontrib>Chen, Yining</creatorcontrib><creatorcontrib>Bai, Yu</creatorcontrib><creatorcontrib>He, Yuchen</creatorcontrib><creatorcontrib>Zhang, Yuchen</creatorcontrib><creatorcontrib>Wang, Yunyun</creatorcontrib><title>OpenAI o1 System Card</title><description>The o1 model series is trained with large-scale reinforcement learning to
reason using chain of thought. These advanced reasoning capabilities provide
new avenues for improving the safety and robustness of our models. In
particular, our models can reason about our safety policies in context when
responding to potentially unsafe prompts, through deliberative alignment. This
leads to state-of-the-art performance on certain benchmarks for risks such as
generating illicit advice, choosing stereotyped responses, and succumbing to
known jailbreaks. Training models to incorporate a chain of thought before
answering has the potential to unlock substantial benefits, while also
increasing potential risks that stem from heightened intelligence. Our results
underscore the need for building robust alignment methods, extensively
stress-testing their efficacy, and maintaining meticulous risk management
protocols. This report outlines the safety work carried out for the OpenAI o1
and OpenAI o1-mini models, including safety evaluations, external red teaming,
and Preparedness Framework evaluations.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0Mzcy4GQQ9S9IzXP0VMg3VAiuLC5JzVVwTixK4WFgTUvMKU7lhdLcDPJuriHOHrpgA-ILijJzE4sq40EGxYMNMiasAgCMISWv</recordid><startdate>20241221</startdate><enddate>20241221</enddate><creator>OpenAI</creator><creator>Jaech, Aaron</creator><creator>Low, Aiden</creator><creator>Helyar, Alec</creator><creator>Passos, Alex Tachard</creator><creator>Neitz, Alexander</creator><creator>Tam, Allison</creator><creator>Bennett, Ally</creator><creator>Applebaum, Andy</creator><creator>Zoph, Barret</creator><creator>Ghorbani, Behrooz</creator><creator>Rossen, Ben</creator><creator>McKinzie, Brandon</creator><creator>Lugaresi, Camillo</creator><creator>Shen, Chen</creator><creator>Zhang, Chong</creator><creator>Koch, Chris</creator><creator>Roberts, Dan</creator><creator>Kappler, Daniel</creator><creator>Dohan, David</creator><creator>Farhi, David</creator><creator>Zhang, Eddie</creator><creator>Wallace, Eric</creator><creator>Ritter, Erik</creator><creator>Such, Felipe Petroski</creator><creator>Raso, Filippo</creator><creator>Tsimpourlas, Foivos</creator><creator>Sulit, Freddie</creator><creator>Parascandolo, Giambattista</creator><creator>Chabot, Gildas</creator><creator>Andrin, Hart</creator><creator>Ren, Hongyu</creator><creator>Lightman, Hunter</creator><creator>Kivlichan, Ian</creator><creator>Kostrikov, Ilya</creator><creator>Sutskever, Ilya</creator><creator>Lennon, James</creator><creator>Harb, Jean</creator><creator>Yu, Jiahui</creator><creator>Tang, Jie</creator><creator>Yu, Jieqi</creator><creator>Parish, Joel</creator><creator>Heidecke, Johannes</creator><creator>Ward, Jonathan</creator><creator>Huizinga, Joost</creator><creator>Nguyen, Karina</creator><creator>Shi, Katy</creator><creator>Gu-Lemberg, Keren</creator><creator>Lu, Kevin</creator><creator>Yu, Kevin</creator><creator>Ahmad, Lama</creator><creator>Kuhn, Lorenz</creator><creator>Kondraciuk, Lukas</creator><creator>Boyd, Madelaine</creator><creator>Joglekar, Manas</creator><creator>Chen, Mark</creator><creator>Tintor, Marko</creator><creator>Schwarzer, Max</creator><creator>Shah, Meghan</creator><creator>Yatbaz, Mehmet</creator><creator>Xu, Mengyuan</creator><creator>Yan, Mengyuan</creator><creator>Glaese, Mia</creator><creator>Malek, Michael</creator><creator>Pavlov, Mikhail</creator><creator>Wang, Miles</creator><creator>McAleese, Nat</creator><creator>Chowdhury, Neil</creator><creator>Ryder, Nick</creator><creator>Chao, Patrick</creator><creator>Izmailov, Pavel</creator><creator>Arora, Rahul</creator><creator>Lopes, Rapha Gontijo</creator><creator>Gaon, Raz</creator><creator>Leike, Reimar</creator><creator>Brown, Robin</creator><creator>Altman, Sam</creator><creator>Agarwal, Sandhini</creator><creator>Baker, Sasha</creator><creator>McKinney, Scott</creator><creator>Yan, Scottie</creator><creator>Chaudhuri, Shraman Ray</creator><creator>Zhang, Shuyuan</creator><creator>Fu, Siyuan</creator><creator>Wang, Tao</creator><creator>Gordon, Taylor</creator><creator>Patwardhan, Tejal</creator><creator>Dimson, Thomas</creator><creator>Zheng, Tianhao</creator><creator>Stasi, Tom</creator><creator>Bansal, Trapit</creator><creator>Creech, Trevor</creator><creator>Peterson, Troy</creator><creator>Zhou, Wenda</creator><creator>Dubois, Yann</creator><creator>Chen, Yining</creator><creator>Bai, Yu</creator><creator>He, Yuchen</creator><creator>Zhang, Yuchen</creator><creator>Wang, Yunyun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241221</creationdate><title>OpenAI o1 System Card</title><author>OpenAI ; Jaech, Aaron ; Low, Aiden ; Helyar, Alec ; Passos, Alex Tachard ; Neitz, Alexander ; Tam, Allison ; Bennett, Ally ; Applebaum, Andy ; Zoph, Barret ; Ghorbani, Behrooz ; Rossen, Ben ; McKinzie, Brandon ; Lugaresi, Camillo ; Shen, Chen ; Zhang, Chong ; Koch, Chris ; Roberts, Dan ; Kappler, Daniel ; Dohan, David ; Farhi, David ; Zhang, Eddie ; Wallace, Eric ; Ritter, Erik ; Such, Felipe Petroski ; Raso, Filippo ; Tsimpourlas, Foivos ; Sulit, Freddie ; Parascandolo, Giambattista ; Chabot, Gildas ; Andrin, Hart ; Ren, Hongyu ; Lightman, Hunter ; Kivlichan, Ian ; Kostrikov, Ilya ; Sutskever, Ilya ; Lennon, James ; Harb, Jean ; Yu, Jiahui ; Tang, Jie ; Yu, Jieqi ; Parish, Joel ; Heidecke, Johannes ; Ward, Jonathan ; Huizinga, Joost ; Nguyen, Karina ; Shi, Katy ; Gu-Lemberg, Keren ; Lu, Kevin ; Yu, Kevin ; Ahmad, Lama ; Kuhn, Lorenz ; Kondraciuk, Lukas ; Boyd, Madelaine ; Joglekar, Manas ; Chen, Mark ; Tintor, Marko ; Schwarzer, Max ; Shah, Meghan ; Yatbaz, Mehmet ; Xu, Mengyuan ; Yan, Mengyuan ; Glaese, Mia ; Malek, Michael ; Pavlov, Mikhail ; Wang, Miles ; McAleese, Nat ; Chowdhury, Neil ; Ryder, Nick ; Chao, Patrick ; Izmailov, Pavel ; Arora, Rahul ; Lopes, Rapha Gontijo ; Gaon, Raz ; Leike, Reimar ; Brown, Robin ; Altman, Sam ; Agarwal, Sandhini ; Baker, Sasha ; McKinney, Scott ; Yan, Scottie ; Chaudhuri, Shraman Ray ; Zhang, Shuyuan ; Fu, Siyuan ; Wang, Tao ; Gordon, Taylor ; Patwardhan, Tejal ; Dimson, Thomas ; Zheng, Tianhao ; Stasi, Tom ; Bansal, Trapit ; Creech, Trevor ; Peterson, Troy ; Zhou, Wenda ; Dubois, Yann ; Chen, Yining ; Bai, Yu ; He, Yuchen ; Zhang, Yuchen ; Wang, Yunyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_167203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>OpenAI</creatorcontrib><creatorcontrib>Jaech, Aaron</creatorcontrib><creatorcontrib>Low, Aiden</creatorcontrib><creatorcontrib>Helyar, Alec</creatorcontrib><creatorcontrib>Passos, Alex Tachard</creatorcontrib><creatorcontrib>Neitz, Alexander</creatorcontrib><creatorcontrib>Tam, Allison</creatorcontrib><creatorcontrib>Bennett, Ally</creatorcontrib><creatorcontrib>Applebaum, Andy</creatorcontrib><creatorcontrib>Zoph, Barret</creatorcontrib><creatorcontrib>Ghorbani, Behrooz</creatorcontrib><creatorcontrib>Rossen, Ben</creatorcontrib><creatorcontrib>McKinzie, Brandon</creatorcontrib><creatorcontrib>Lugaresi, Camillo</creatorcontrib><creatorcontrib>Shen, Chen</creatorcontrib><creatorcontrib>Zhang, Chong</creatorcontrib><creatorcontrib>Koch, Chris</creatorcontrib><creatorcontrib>Roberts, Dan</creatorcontrib><creatorcontrib>Kappler, Daniel</creatorcontrib><creatorcontrib>Dohan, David</creatorcontrib><creatorcontrib>Farhi, David</creatorcontrib><creatorcontrib>Zhang, Eddie</creatorcontrib><creatorcontrib>Wallace, Eric</creatorcontrib><creatorcontrib>Ritter, Erik</creatorcontrib><creatorcontrib>Such, Felipe Petroski</creatorcontrib><creatorcontrib>Raso, Filippo</creatorcontrib><creatorcontrib>Tsimpourlas, Foivos</creatorcontrib><creatorcontrib>Sulit, Freddie</creatorcontrib><creatorcontrib>Parascandolo, Giambattista</creatorcontrib><creatorcontrib>Chabot, Gildas</creatorcontrib><creatorcontrib>Andrin, Hart</creatorcontrib><creatorcontrib>Ren, Hongyu</creatorcontrib><creatorcontrib>Lightman, Hunter</creatorcontrib><creatorcontrib>Kivlichan, Ian</creatorcontrib><creatorcontrib>Kostrikov, Ilya</creatorcontrib><creatorcontrib>Sutskever, Ilya</creatorcontrib><creatorcontrib>Lennon, James</creatorcontrib><creatorcontrib>Harb, Jean</creatorcontrib><creatorcontrib>Yu, Jiahui</creatorcontrib><creatorcontrib>Tang, Jie</creatorcontrib><creatorcontrib>Yu, Jieqi</creatorcontrib><creatorcontrib>Parish, Joel</creatorcontrib><creatorcontrib>Heidecke, Johannes</creatorcontrib><creatorcontrib>Ward, Jonathan</creatorcontrib><creatorcontrib>Huizinga, Joost</creatorcontrib><creatorcontrib>Nguyen, Karina</creatorcontrib><creatorcontrib>Shi, Katy</creatorcontrib><creatorcontrib>Gu-Lemberg, Keren</creatorcontrib><creatorcontrib>Lu, Kevin</creatorcontrib><creatorcontrib>Yu, Kevin</creatorcontrib><creatorcontrib>Ahmad, Lama</creatorcontrib><creatorcontrib>Kuhn, Lorenz</creatorcontrib><creatorcontrib>Kondraciuk, Lukas</creatorcontrib><creatorcontrib>Boyd, Madelaine</creatorcontrib><creatorcontrib>Joglekar, Manas</creatorcontrib><creatorcontrib>Chen, Mark</creatorcontrib><creatorcontrib>Tintor, Marko</creatorcontrib><creatorcontrib>Schwarzer, Max</creatorcontrib><creatorcontrib>Shah, Meghan</creatorcontrib><creatorcontrib>Yatbaz, Mehmet</creatorcontrib><creatorcontrib>Xu, Mengyuan</creatorcontrib><creatorcontrib>Yan, Mengyuan</creatorcontrib><creatorcontrib>Glaese, Mia</creatorcontrib><creatorcontrib>Malek, Michael</creatorcontrib><creatorcontrib>Pavlov, Mikhail</creatorcontrib><creatorcontrib>Wang, Miles</creatorcontrib><creatorcontrib>McAleese, Nat</creatorcontrib><creatorcontrib>Chowdhury, Neil</creatorcontrib><creatorcontrib>Ryder, Nick</creatorcontrib><creatorcontrib>Chao, Patrick</creatorcontrib><creatorcontrib>Izmailov, Pavel</creatorcontrib><creatorcontrib>Arora, Rahul</creatorcontrib><creatorcontrib>Lopes, Rapha Gontijo</creatorcontrib><creatorcontrib>Gaon, Raz</creatorcontrib><creatorcontrib>Leike, Reimar</creatorcontrib><creatorcontrib>Brown, Robin</creatorcontrib><creatorcontrib>Altman, Sam</creatorcontrib><creatorcontrib>Agarwal, Sandhini</creatorcontrib><creatorcontrib>Baker, Sasha</creatorcontrib><creatorcontrib>McKinney, Scott</creatorcontrib><creatorcontrib>Yan, Scottie</creatorcontrib><creatorcontrib>Chaudhuri, Shraman Ray</creatorcontrib><creatorcontrib>Zhang, Shuyuan</creatorcontrib><creatorcontrib>Fu, Siyuan</creatorcontrib><creatorcontrib>Wang, Tao</creatorcontrib><creatorcontrib>Gordon, Taylor</creatorcontrib><creatorcontrib>Patwardhan, Tejal</creatorcontrib><creatorcontrib>Dimson, Thomas</creatorcontrib><creatorcontrib>Zheng, Tianhao</creatorcontrib><creatorcontrib>Stasi, Tom</creatorcontrib><creatorcontrib>Bansal, Trapit</creatorcontrib><creatorcontrib>Creech, Trevor</creatorcontrib><creatorcontrib>Peterson, Troy</creatorcontrib><creatorcontrib>Zhou, Wenda</creatorcontrib><creatorcontrib>Dubois, Yann</creatorcontrib><creatorcontrib>Chen, Yining</creatorcontrib><creatorcontrib>Bai, Yu</creatorcontrib><creatorcontrib>He, Yuchen</creatorcontrib><creatorcontrib>Zhang, Yuchen</creatorcontrib><creatorcontrib>Wang, Yunyun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>OpenAI</au><au>Jaech, Aaron</au><au>Low, Aiden</au><au>Helyar, Alec</au><au>Passos, Alex Tachard</au><au>Neitz, Alexander</au><au>Tam, Allison</au><au>Bennett, Ally</au><au>Applebaum, Andy</au><au>Zoph, Barret</au><au>Ghorbani, Behrooz</au><au>Rossen, Ben</au><au>McKinzie, Brandon</au><au>Lugaresi, Camillo</au><au>Shen, Chen</au><au>Zhang, Chong</au><au>Koch, Chris</au><au>Roberts, Dan</au><au>Kappler, Daniel</au><au>Dohan, David</au><au>Farhi, David</au><au>Zhang, Eddie</au><au>Wallace, Eric</au><au>Ritter, Erik</au><au>Such, Felipe Petroski</au><au>Raso, Filippo</au><au>Tsimpourlas, Foivos</au><au>Sulit, Freddie</au><au>Parascandolo, Giambattista</au><au>Chabot, Gildas</au><au>Andrin, Hart</au><au>Ren, Hongyu</au><au>Lightman, Hunter</au><au>Kivlichan, Ian</au><au>Kostrikov, Ilya</au><au>Sutskever, Ilya</au><au>Lennon, James</au><au>Harb, Jean</au><au>Yu, Jiahui</au><au>Tang, Jie</au><au>Yu, Jieqi</au><au>Parish, Joel</au><au>Heidecke, Johannes</au><au>Ward, Jonathan</au><au>Huizinga, Joost</au><au>Nguyen, Karina</au><au>Shi, Katy</au><au>Gu-Lemberg, Keren</au><au>Lu, Kevin</au><au>Yu, Kevin</au><au>Ahmad, Lama</au><au>Kuhn, Lorenz</au><au>Kondraciuk, Lukas</au><au>Boyd, Madelaine</au><au>Joglekar, Manas</au><au>Chen, Mark</au><au>Tintor, Marko</au><au>Schwarzer, Max</au><au>Shah, Meghan</au><au>Yatbaz, Mehmet</au><au>Xu, Mengyuan</au><au>Yan, Mengyuan</au><au>Glaese, Mia</au><au>Malek, Michael</au><au>Pavlov, Mikhail</au><au>Wang, Miles</au><au>McAleese, Nat</au><au>Chowdhury, Neil</au><au>Ryder, Nick</au><au>Chao, Patrick</au><au>Izmailov, Pavel</au><au>Arora, Rahul</au><au>Lopes, Rapha Gontijo</au><au>Gaon, Raz</au><au>Leike, Reimar</au><au>Brown, Robin</au><au>Altman, Sam</au><au>Agarwal, Sandhini</au><au>Baker, Sasha</au><au>McKinney, Scott</au><au>Yan, Scottie</au><au>Chaudhuri, Shraman Ray</au><au>Zhang, Shuyuan</au><au>Fu, Siyuan</au><au>Wang, Tao</au><au>Gordon, Taylor</au><au>Patwardhan, Tejal</au><au>Dimson, Thomas</au><au>Zheng, Tianhao</au><au>Stasi, Tom</au><au>Bansal, Trapit</au><au>Creech, Trevor</au><au>Peterson, Troy</au><au>Zhou, Wenda</au><au>Dubois, Yann</au><au>Chen, Yining</au><au>Bai, Yu</au><au>He, Yuchen</au><au>Zhang, Yuchen</au><au>Wang, Yunyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OpenAI o1 System Card</atitle><date>2024-12-21</date><risdate>2024</risdate><abstract>The o1 model series is trained with large-scale reinforcement learning to
reason using chain of thought. These advanced reasoning capabilities provide
new avenues for improving the safety and robustness of our models. In
particular, our models can reason about our safety policies in context when
responding to potentially unsafe prompts, through deliberative alignment. This
leads to state-of-the-art performance on certain benchmarks for risks such as
generating illicit advice, choosing stereotyped responses, and succumbing to
known jailbreaks. Training models to incorporate a chain of thought before
answering has the potential to unlock substantial benefits, while also
increasing potential risks that stem from heightened intelligence. Our results
underscore the need for building robust alignment methods, extensively
stress-testing their efficacy, and maintaining meticulous risk management
protocols. This report outlines the safety work carried out for the OpenAI o1
and OpenAI o1-mini models, including safety evaluations, external red teaming,
and Preparedness Framework evaluations.</abstract><doi>10.48550/arxiv.2412.16720</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.16720 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_16720 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence |
title | OpenAI o1 System Card |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A28%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OpenAI%20o1%20System%20Card&rft.au=OpenAI&rft.date=2024-12-21&rft_id=info:doi/10.48550/arxiv.2412.16720&rft_dat=%3Carxiv_GOX%3E2412_16720%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |