Employing Deep Reinforcement Learning to Maximize Lower Limb Blood Flow Using Intermittent Pneumatic Compression
Intermittent pneumatic compression (IPC) systems apply external pressure to the lower limbs and enhance peripheral blood flow. We previously introduced a cardiac-gated compression system that enhanced arterial blood velocity (BV) in the lower limb compared to fixed compression timing (CT) for seated...
Gespeichert in:
Veröffentlicht in: | IEEE journal of biomedical and health informatics 2024-10, Vol.28 (10), p.6193-6200 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Intermittent pneumatic compression (IPC) systems apply external pressure to the lower limbs and enhance peripheral blood flow. We previously introduced a cardiac-gated compression system that enhanced arterial blood velocity (BV) in the lower limb compared to fixed compression timing (CT) for seated and standing subjects. However, these pilot studies found that the CT that maximized BV was not constant across individuals and could change over time. Current CT modelling methods for IPC are limited to predictions for a single day and one heartbeat ahead. However, IPC therapy for may span weeks or longer, the BV response to compression can vary with physiological state, and the best CT for eliciting the desired physiological outcome may change, even for the same individual. We propose that a deep reinforcement learning (DRL) algorithm can learn and adaptively modify CT to achieve a selected outcome using IPC. Herein, we target maximizing lower limb arterial BV as the desired outcome and build participant-specific simulated lower limb environments for 6 participants. We show that DRL can adaptively learn the CT for IPC that maximized arterial BV. Compared to previous work, the DRL agent achieves 98% \pm 2 of the resultant blood flow and is faster at maximizing BV; the DRL agent can learn an "optimal" policy in 15 minutes \pm 2 on average and can adapt on the fly. Given a desired objective, we posit that the proposed DRL agent can be implemented in IPC systems to rapidly learn the (potentially time-varying) "optimal" CT with a human-in-the-loop. |
---|---|
ISSN: | 2168-2194 2168-2208 2168-2208 |
DOI: | 10.1109/JBHI.2024.3423698 |