Deep reinforcement learning for personalized treatment recommendation

In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient‐specific molecular and clinical profiles, possibly high‐dimensional. To advance cancer treatment, large‐scale screenings of cancer cell lines against chemical compounds hav...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Statistics in medicine 2022-09, Vol.41 (20), p.4034-4056
Hauptverfasser:	Liu, Mingyang, Shen, Xiaotong, Pan, Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	actor‐critic methods Cancer therapies Deep learning Humans Machine Learning Markov Chains Neoplasms - drug therapy Precision Medicine Proximal Policy Optimization recommender systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient‐specific molecular and clinical profiles, possibly high‐dimensional. To advance cancer treatment, large‐scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof‐of‐concept, we conduct experiments on two large‐scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL‐based PPORank outperforms the state‐of‐the‐art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.9491