Parameterized reinforcement learning for optical system optimization

Engineering a physical system to feature designated characteristics states an inverse design problem, which is often determined by several discrete and continuous parameters. If such a system must feature a particular behavior, the mentioned combination of both, discrete and continuous, parameters r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. D, Applied physics Applied physics, 2021-07, Vol.54 (30), p.305104
Hauptverfasser: Wankerl, Heribert, Stern, Maike L, Mahdavi, Ali, Eichler, Christoph, Lang, Elmar W
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Engineering a physical system to feature designated characteristics states an inverse design problem, which is often determined by several discrete and continuous parameters. If such a system must feature a particular behavior, the mentioned combination of both, discrete and continuous, parameters results in a challenging optimization problem that requires an extensive search for an optimal system design. However, if the corresponding inverse design problem can be reformulated as a parameterized Markov decision process, reinforcement learning (RL) provides a heuristic framework to solve it. In this work, we use multi-layer thin films as an example of the aforementioned optimization problems and consider three design parameters: Each of the thin film layer’s dielectric material (discrete) and thickness (continuous), as well as the total number of layers (discrete). While recent methods merely determine the optimal thicknesses and—less commonly—the layers’ materials, our approach optimizes the total number of stacked layers as well. In summary, we further develop a Q-learning variant to solve inverse design optimization and thereby outperform human experts and current approaches like needle-point optimization or naive RL. For this purpose, we propose an exponentially transformed reward signal that eases policy search and enables constrained optimization. Moreover, the learned Q-values contain information about the optical properties of multi-layer thin films, which allows us a physical interpretation or what-if analysis and thus enables explainability.
ISSN:0022-3727
1361-6463
DOI:10.1088/1361-6463/abfddb