Reinforcement learning with Gaussian processes for condition-based maintenance

•Reinforcement learning for condition-based maintenance with continuous-state MDP.•Gaussian process regression for function approximation in reinforcement learning.•Develop a new Gaussian process for reinforcement learning (GPRL) algorithm.•Case study of GPRL algorithm on battery maintenance decisio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & industrial engineering 2021-08, Vol.158, p.107321, Article 107321
Hauptverfasser:	Peng, Shenglin, Feng, Qianmei (May)
Format:	Artikel
Sprache:	eng
Schlagworte:	Condition-based maintenance Function approximation Gaussian process regression Gaussian processes for reinforcement learning Markov decision process Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Reinforcement learning for condition-based maintenance with continuous-state MDP.•Gaussian process regression for function approximation in reinforcement learning.•Develop a new Gaussian process for reinforcement learning (GPRL) algorithm.•Case study of GPRL algorithm on battery maintenance decision-making. Condition-based maintenance strategies are effective in enhancing reliability and safety for complex engineering systems that exhibit degradation phenomena with uncertainty. Such sequential decision-making problems are often modeled as Markov decision processes (MDPs) when the underlying process has a Markov property. Recently, reinforcement learning (RL) becomes increasingly efficient to address MDP problems with large state spaces. In this paper, we model the condition-based maintenance problem as a discrete-time continuous-state MDP without discretizing the deterioration condition of the system. The Gaussian process regression is used as function approximation to model the state transition and the value functions of states in reinforcement learning. A RL algorithm is then developed to minimize the long-run average cost (instead of the commonly-used discounted reward) with iterations on the state-action value function and the state value function, respectively. We verify the capability of the proposed algorithm by simulation experiments and demonstrate its advantages in a case study on a battery maintenance decision-making problem. The proposed algorithm outperforms the discrete MDP approach by achieving lower long-run average costs.
ISSN:	0360-8352 1879-0550
DOI:	10.1016/j.cie.2021.107321