Intrinsic Decay Property of Ti/TiOx/Pt Memristor for Reinforcement Learning

A memristor‐based reinforcement learning (RL) system has shown outstanding performance in achieving efficient autonomous decision‐making and edge computing. Sarsa (λ) is a classical multistep RL algorithm that records state with λ decay and guides policy updates, significantly improving the algorith...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Advanced Intelligent Systems 2023-07, Vol.5 (7), p.n/a
Hauptverfasser:	Dai, Yuehua, Guo, Wenbin, Feng, Zhe, Xu, Zuyu, Zhu, Yunlai, Yang, Fei, Wu, Zuheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Arrays conductance decay Convergence Decay Decision making Edge computing Energy consumption Floating point arithmetic Hardware Markov analysis Memristors path planning Questions and answers reinforcement learning Sarsa (λ) Scanning electron microscopy Titanium oxides
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A memristor‐based reinforcement learning (RL) system has shown outstanding performance in achieving efficient autonomous decision‐making and edge computing. Sarsa (λ) is a classical multistep RL algorithm that records state with λ decay and guides policy updates, significantly improving the algorithm convergence speed. However, λ decay implementation of traditional computing hardware is confined by the extensive computation of power exponential decay. Herein, the value update equation for Sarsa (λ) is implemented by using the topological structure of the memristor array, without complex circuits. Where, most importantly, the critical λ decay function is realized by a TiOx‐based memristor with conductance decay property. The energy required for floating‐point operations can be significantly reduced while accelerating the convergence speed. Then, a path planning task is demonstrated based on intrinsic conductance decay property and shows outstanding performance. Finally, the information of rounds used for the task is obtained, which is based on the intrinsic decay property of the TiOx‐based memristor, maps into a 32 × 32 memristor array in parallel to calculate the value of each round. The results indicate that the experimental data have similar results to the simulations. Herein, thus, it provides a hardware‐enabled scheme for the memristor‐based RL algorithm implementation. The decay properties of TiOx‐based memristors are used to implement the classical Sarsa (λ) algorithm for reinforcement learning (RL). A path planning task based on this system is successfully verified. Moreover, the decay range is scalable by simple circuit units, making the system more flexible. Thus, it provides a fast convergence scheme for the memristor‐based RL algorithm implementation.
ISSN:	2640-4567 2640-4567
DOI:	10.1002/aisy.202200455