Deep reinforcement learning for treatment planning in high-dose-rate cervical brachytherapy

•Developed an intelligent treatment planner network via deep reinforcement learning.•The Dueling Double-Deep Q Network is used as the backbone of the proposed method.•Reward strategy based on hybrid equivalent uniform dose objective function.•The constructed network generates better plans than IPSA...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Physica medica 2022-02, Vol.94, p.1-7
Hauptverfasser: Pu, Gang, Jiang, Shan, Yang, Zhiyong, Hu, Yuanjing, Liu, Ziqi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Developed an intelligent treatment planner network via deep reinforcement learning.•The Dueling Double-Deep Q Network is used as the backbone of the proposed method.•Reward strategy based on hybrid equivalent uniform dose objective function.•The constructed network generates better plans than IPSA in a human-like fashion.•Deep reinforcement learning has the potential to solve optimization problems. High-dose-rate (HDR) brachytherapy (BT) is an effective cancer treatment method in which the radiation source is placed within the body. Treatment planning is a critical component for a successful outcome. Almost all currently proposed treatment planning methods are built on stochastic heuristic algorithms, which limits the generation of higher quality plans. This study proposed a novel treatment planning method to adjust dwell times in a human-like fashion to improve the quality of the plan. We built an intelligent treatment planner network (ITPN) based on deep reinforcement learning (DRL). The network architecture of ITPN is Dueling Double-Deep Q Network. The state is the dwell time of each dwell position and the action is which dwell time to adjust and how to adjust it. A hybrid equivalent uniform dose objective function was established and assigned corresponding rewards according to its changes. Experience replay was performed with the epsilon greedy algorithm and SumTree data structure. In the evaluation of ITPN using 20 patient cases, D90, D100 and V100 showed no significant difference compared with inverse planning simulated annealing (IPSA) optimization. However, D2cc of bladder, rectum and sigmoid, V150 and V200 were significant reduced, and homogeneity index and conformity index were significantly increased. The proposed ITPN was able to generate higher quality plans based on the learned dwell time adjustment policy than IPSA. This is the first artificial intelligence system that can directly determine the dwell times of HDR BT, which demonstrated the potential feasibility of solving optimization problems via DRL.
ISSN:1120-1797
1724-191X
DOI:10.1016/j.ejmp.2021.12.009