Combinatorial Q-Learning for Condition-Based Infrastructure Maintenance

Infrastructure maintenance planning is a large-scale optimization problem of planning when and on which components to carry out maintenance so as to keep the whole infrastructure in good condition with minimal maintenance cost. Recent advances in condition monitoring techniques have enabled timely m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.46788-46799
1. Verfasser: Tanimoto, Akira
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Infrastructure maintenance planning is a large-scale optimization problem of planning when and on which components to carry out maintenance so as to keep the whole infrastructure in good condition with minimal maintenance cost. Recent advances in condition monitoring techniques have enabled timely maintenance in response to the condition of each part regardless of age. In addition to the condition, the spatial structure is also important for cost-efficiency in infrastructure maintenance since traveling costs and/or setup costs can be saved by simultaneous maintenance of neighboring components, which is called economic dependency. This optimization problem naively has a high computational complexity of O(2^{nH}) , where n is the number of components and H is the planning horizon, and the predictive modeling of degradation is also a big issue. To solve this problem efficiently at scale, our proposed method utilizes two kinds of dynamic programming for temporal and spatial scalability and consequently enjoys O(n) complexity at each time step. For temporal scalability, we utilize a direct modeling approach for the action value of maintenance instead of modeling degradation, namely, Q-learning. For spatial scalability, we exploit locality in economic dependency by means of a reasonable approximation of the Q-function. A typical baseline approach is to divide the whole infrastructure into fixed groups of neighboring components beforehand and determine if maintenance should be performed for all the components in each group at each time step. In contrast, our scalable method enables fully combinatorial optimization for each component at each time step. We demonstrate the advantage of our method in a simulated environment, and the resulting maintenance history intuitively illustrates the benefit of our dynamic grouping approach. We also show that our method has a kind of interpretability in the optimization at each time step.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3059244