Action Set Based Policy Optimization for Safe Power Grid Management

Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-06
Hauptverfasser:	Zhou, Bo, Zeng, Hongsheng, Liu, Yuecheng, Li, Kejiao, Wang, Fan, Tian, Hao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Decision making Learning Natural disasters Optimization Power consumption Power lines Stability
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Zhou, Bo Zeng, Hongsheng Liu, Yuecheng Li, Kejiao Wang, Fan Tian, Hao
description	Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2546799922</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2546799922</sourcerecordid><originalsourceid>FETCH-proquest_journals_25467999223</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwdkwuyczPUwhOLVFwSixOTVEIyM_JTK5U8C8oyczNrEoEy6blFykEJ6alAiXLU4sU3IsyUxR8E_MS01NzU_NKeBhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjUxMzc0tLSyMiYOFUAmwU5jA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2546799922</pqid></control><display><type>article</type><title>Action Set Based Policy Optimization for Safe Power Grid Management</title><source>Free E- Journals</source><creator>Zhou, Bo ; Zeng, Hongsheng ; Liu, Yuecheng ; Li, Kejiao ; Wang, Fan ; Tian, Hao</creator><creatorcontrib>Zhou, Bo ; Zeng, Hongsheng ; Liu, Yuecheng ; Li, Kejiao ; Wang, Fan ; Tian, Hao</creatorcontrib><description>Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Decision making ; Learning ; Natural disasters ; Optimization ; Power consumption ; Power lines ; Stability</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Zhou, Bo</creatorcontrib><creatorcontrib>Zeng, Hongsheng</creatorcontrib><creatorcontrib>Liu, Yuecheng</creatorcontrib><creatorcontrib>Li, Kejiao</creatorcontrib><creatorcontrib>Wang, Fan</creatorcontrib><creatorcontrib>Tian, Hao</creatorcontrib><title>Action Set Based Policy Optimization for Safe Power Grid Management</title><title>arXiv.org</title><description>Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.</description><subject>Algorithms</subject><subject>Decision making</subject><subject>Learning</subject><subject>Natural disasters</subject><subject>Optimization</subject><subject>Power consumption</subject><subject>Power lines</subject><subject>Stability</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwdkwuyczPUwhOLVFwSixOTVEIyM_JTK5U8C8oyczNrEoEy6blFykEJ6alAiXLU4sU3IsyUxR8E_MS01NzU_NKeBhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjUxMzc0tLSyMiYOFUAmwU5jA</recordid><startdate>20210629</startdate><enddate>20210629</enddate><creator>Zhou, Bo</creator><creator>Zeng, Hongsheng</creator><creator>Liu, Yuecheng</creator><creator>Li, Kejiao</creator><creator>Wang, Fan</creator><creator>Tian, Hao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210629</creationdate><title>Action Set Based Policy Optimization for Safe Power Grid Management</title><author>Zhou, Bo ; Zeng, Hongsheng ; Liu, Yuecheng ; Li, Kejiao ; Wang, Fan ; Tian, Hao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25467999223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Decision making</topic><topic>Learning</topic><topic>Natural disasters</topic><topic>Optimization</topic><topic>Power consumption</topic><topic>Power lines</topic><topic>Stability</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Bo</creatorcontrib><creatorcontrib>Zeng, Hongsheng</creatorcontrib><creatorcontrib>Liu, Yuecheng</creatorcontrib><creatorcontrib>Li, Kejiao</creatorcontrib><creatorcontrib>Wang, Fan</creatorcontrib><creatorcontrib>Tian, Hao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Bo</au><au>Zeng, Hongsheng</au><au>Liu, Yuecheng</au><au>Li, Kejiao</au><au>Wang, Fan</au><au>Tian, Hao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Action Set Based Policy Optimization for Safe Power Grid Management</atitle><jtitle>arXiv.org</jtitle><date>2021-06-29</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2546799922
source	Free E- Journals
subjects	Algorithms Decision making Learning Natural disasters Optimization Power consumption Power lines Stability
title	Action Set Based Policy Optimization for Safe Power Grid Management
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A07%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Action%20Set%20Based%20Policy%20Optimization%20for%20Safe%20Power%20Grid%20Management&rft.jtitle=arXiv.org&rft.au=Zhou,%20Bo&rft.date=2021-06-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2546799922%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2546799922&rft_id=info:pmid/&rfr_iscdi=true