Numerical Quadrature for Probabilistic Policy Search

Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2020-01, Vol.42 (1), p.164-175
Hauptverfasser:	Vinogradska, Julia, Bischoff, Bastian, Achterhold, Jan, Koller, Torsten, Peters, Jan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational modeling Computer simulation control Control theory Data models Efficiency Gaussian processes Learning Numerical models Policies Policy search Predictive models Probabilistic models Quadratures reinforcement learning Robustness (mathematics) System dynamics Trajectory optimization Uncertainty
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	175
container_issue	1
container_start_page	164
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	42
creator	Vinogradska, Julia Bischoff, Bastian Achterhold, Jan Koller, Torsten Peters, Jan
description	Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions typically become intractable for larger planning horizons and can only poorly be approximated. In this paper, we propose the use of numerical quadrature to overcome this drawback and provide significantly more accurate multi-step-ahead predictions. As a result, our approach increases data efficiency and enhances the quality of learned policies. Furthermore, policy learning is not restricted to optimizing locally around one trajectory, as numerical quadrature provides a principled approach to extend optimization to all trajectories starting in a specified starting state region. Thus, manual effort, such as choosing informative starting points for simultaneous policy optimization, is significantly decreased. Furthermore, learning is highly robust to the choice of initial policy and, thus, interaction time with the system is minimized. Empirical evaluations on simulated benchmark problems show the efficiency of the proposed approach and support our theoretical results.
doi_str_mv	10.1109/TPAMI.2018.2879335
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8520758</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8520758</ieee_id><sourcerecordid>2131243051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-8af18df34c7446524d33cf771d4a75bfc647409434f376ba31aede7ad8969093</originalsourceid><addsrcrecordid>eNpdkD1PwzAURS0EoqXwB0BCkVhYUmw_O7bHquKjUoEiuluOY4tUSVPsZOi_J6WlA9Mb7rlXTweha4LHhGD1sFxMXmdjiokcUykUAD9BQ6JApcBBnaIhJhlNpaRygC5iXGFMGMdwjgaAGYaMkiFib13tQmlNlXx0pgim7YJLfBOSRWhyk5dVGdvSJoumKu02-XQm2K9LdOZNFd3V4Y7Q8ulxOX1J5-_Ps-lknlrgpE2l8UQWHpgVjGWcsgLAeiFIwYzgubcZEwwrBsyDyHIDxLjCCVNIlSmsYITu97Ob0Hx3Lra6LqN1VWXWrumipgQIZYA56dG7f-iq6cK6f05ToFQwiTHrKbqnbGhiDM7rTShrE7aaYL1Tqn-V6p1SfVDal24P011eu-JY-XPYAzd7oHTOHWPJKRZcwg-aWnhd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2322748004</pqid></control><display><type>article</type><title>Numerical Quadrature for Probabilistic Policy Search</title><source>IEEE Electronic Library (IEL)</source><creator>Vinogradska, Julia ; Bischoff, Bastian ; Achterhold, Jan ; Koller, Torsten ; Peters, Jan</creator><creatorcontrib>Vinogradska, Julia ; Bischoff, Bastian ; Achterhold, Jan ; Koller, Torsten ; Peters, Jan</creatorcontrib><description>Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions typically become intractable for larger planning horizons and can only poorly be approximated. In this paper, we propose the use of numerical quadrature to overcome this drawback and provide significantly more accurate multi-step-ahead predictions. As a result, our approach increases data efficiency and enhances the quality of learned policies. Furthermore, policy learning is not restricted to optimizing locally around one trajectory, as numerical quadrature provides a principled approach to extend optimization to all trajectories starting in a specified starting state region. Thus, manual effort, such as choosing informative starting points for simultaneous policy optimization, is significantly decreased. Furthermore, learning is highly robust to the choice of initial policy and, thus, interaction time with the system is minimized. Empirical evaluations on simulated benchmark problems show the efficiency of the proposed approach and support our theoretical results.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2018.2879335</identifier><identifier>PMID: 30403621</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Computational modeling ; Computer simulation ; control ; Control theory ; Data models ; Efficiency ; Gaussian processes ; Learning ; Numerical models ; Policies ; Policy search ; Predictive models ; Probabilistic models ; Quadratures ; reinforcement learning ; Robustness (mathematics) ; System dynamics ; Trajectory optimization ; Uncertainty</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2020-01, Vol.42 (1), p.164-175</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-8af18df34c7446524d33cf771d4a75bfc647409434f376ba31aede7ad8969093</citedby><cites>FETCH-LOGICAL-c351t-8af18df34c7446524d33cf771d4a75bfc647409434f376ba31aede7ad8969093</cites><orcidid>0000-0001-8604-8432 ; 0000-0002-5266-8091 ; 0000-0003-1942-2812</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8520758$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8520758$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30403621$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Bischoff, Bastian</creatorcontrib><creatorcontrib>Achterhold, Jan</creatorcontrib><creatorcontrib>Koller, Torsten</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><title>Numerical Quadrature for Probabilistic Policy Search</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions typically become intractable for larger planning horizons and can only poorly be approximated. In this paper, we propose the use of numerical quadrature to overcome this drawback and provide significantly more accurate multi-step-ahead predictions. As a result, our approach increases data efficiency and enhances the quality of learned policies. Furthermore, policy learning is not restricted to optimizing locally around one trajectory, as numerical quadrature provides a principled approach to extend optimization to all trajectories starting in a specified starting state region. Thus, manual effort, such as choosing informative starting points for simultaneous policy optimization, is significantly decreased. Furthermore, learning is highly robust to the choice of initial policy and, thus, interaction time with the system is minimized. Empirical evaluations on simulated benchmark problems show the efficiency of the proposed approach and support our theoretical results.</description><subject>Computational modeling</subject><subject>Computer simulation</subject><subject>control</subject><subject>Control theory</subject><subject>Data models</subject><subject>Efficiency</subject><subject>Gaussian processes</subject><subject>Learning</subject><subject>Numerical models</subject><subject>Policies</subject><subject>Policy search</subject><subject>Predictive models</subject><subject>Probabilistic models</subject><subject>Quadratures</subject><subject>reinforcement learning</subject><subject>Robustness (mathematics)</subject><subject>System dynamics</subject><subject>Trajectory optimization</subject><subject>Uncertainty</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkD1PwzAURS0EoqXwB0BCkVhYUmw_O7bHquKjUoEiuluOY4tUSVPsZOi_J6WlA9Mb7rlXTweha4LHhGD1sFxMXmdjiokcUykUAD9BQ6JApcBBnaIhJhlNpaRygC5iXGFMGMdwjgaAGYaMkiFib13tQmlNlXx0pgim7YJLfBOSRWhyk5dVGdvSJoumKu02-XQm2K9LdOZNFd3V4Y7Q8ulxOX1J5-_Ps-lknlrgpE2l8UQWHpgVjGWcsgLAeiFIwYzgubcZEwwrBsyDyHIDxLjCCVNIlSmsYITu97Ob0Hx3Lra6LqN1VWXWrumipgQIZYA56dG7f-iq6cK6f05ToFQwiTHrKbqnbGhiDM7rTShrE7aaYL1Tqn-V6p1SfVDal24P011eu-JY-XPYAzd7oHTOHWPJKRZcwg-aWnhd</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Vinogradska, Julia</creator><creator>Bischoff, Bastian</creator><creator>Achterhold, Jan</creator><creator>Koller, Torsten</creator><creator>Peters, Jan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8604-8432</orcidid><orcidid>https://orcid.org/0000-0002-5266-8091</orcidid><orcidid>https://orcid.org/0000-0003-1942-2812</orcidid></search><sort><creationdate>20200101</creationdate><title>Numerical Quadrature for Probabilistic Policy Search</title><author>Vinogradska, Julia ; Bischoff, Bastian ; Achterhold, Jan ; Koller, Torsten ; Peters, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-8af18df34c7446524d33cf771d4a75bfc647409434f376ba31aede7ad8969093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computational modeling</topic><topic>Computer simulation</topic><topic>control</topic><topic>Control theory</topic><topic>Data models</topic><topic>Efficiency</topic><topic>Gaussian processes</topic><topic>Learning</topic><topic>Numerical models</topic><topic>Policies</topic><topic>Policy search</topic><topic>Predictive models</topic><topic>Probabilistic models</topic><topic>Quadratures</topic><topic>reinforcement learning</topic><topic>Robustness (mathematics)</topic><topic>System dynamics</topic><topic>Trajectory optimization</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Bischoff, Bastian</creatorcontrib><creatorcontrib>Achterhold, Jan</creatorcontrib><creatorcontrib>Koller, Torsten</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vinogradska, Julia</au><au>Bischoff, Bastian</au><au>Achterhold, Jan</au><au>Koller, Torsten</au><au>Peters, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Numerical Quadrature for Probabilistic Policy Search</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2020-01-01</date><risdate>2020</risdate><volume>42</volume><issue>1</issue><spage>164</spage><epage>175</epage><pages>164-175</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions typically become intractable for larger planning horizons and can only poorly be approximated. In this paper, we propose the use of numerical quadrature to overcome this drawback and provide significantly more accurate multi-step-ahead predictions. As a result, our approach increases data efficiency and enhances the quality of learned policies. Furthermore, policy learning is not restricted to optimizing locally around one trajectory, as numerical quadrature provides a principled approach to extend optimization to all trajectories starting in a specified starting state region. Thus, manual effort, such as choosing informative starting points for simultaneous policy optimization, is significantly decreased. Furthermore, learning is highly robust to the choice of initial policy and, thus, interaction time with the system is minimized. Empirical evaluations on simulated benchmark problems show the efficiency of the proposed approach and support our theoretical results.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30403621</pmid><doi>10.1109/TPAMI.2018.2879335</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8604-8432</orcidid><orcidid>https://orcid.org/0000-0002-5266-8091</orcidid><orcidid>https://orcid.org/0000-0003-1942-2812</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2020-01, Vol.42 (1), p.164-175
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_ieee_primary_8520758
source	IEEE Electronic Library (IEL)
subjects	Computational modeling Computer simulation control Control theory Data models Efficiency Gaussian processes Learning Numerical models Policies Policy search Predictive models Probabilistic models Quadratures reinforcement learning Robustness (mathematics) System dynamics Trajectory optimization Uncertainty
title	Numerical Quadrature for Probabilistic Policy Search
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T19%3A52%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Numerical%20Quadrature%20for%20Probabilistic%20Policy%20Search&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Vinogradska,%20Julia&rft.date=2020-01-01&rft.volume=42&rft.issue=1&rft.spage=164&rft.epage=175&rft.pages=164-175&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2018.2879335&rft_dat=%3Cproquest_RIE%3E2131243051%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2322748004&rft_id=info:pmid/30403621&rft_ieee_id=8520758&rfr_iscdi=true