Exploring Reinforcement Learning for End-Diastolic and End-Systolic Frame Detection
The thesis explores ways of formulating the problem of detecting the key cardiac phases from ultrasound videos, i.e., the end diastolic (ED) and end systolic (ES) phases, as a reinforcement learning (RL) problem, and whether there are any benefits in doing so. Of particular interest is the design of...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | nor |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The thesis explores ways of formulating the problem of detecting the key cardiac phases from ultrasound videos, i.e., the end diastolic (ED) and end systolic (ES) phases, as a reinforcement learning (RL) problem, and whether there are any benefits in doing so. Of particular interest is the design of the RL reward function. Three reward functions are explored: one based on a generalization of the performance metric of average absolute frame difference (aaFD) that is only given to the agent at the end of an episode, and two based on per-frame phase classification given at every step. Additionally, two formulations of the RL environment are explored: binary classification environment (BCE), designed to be a direct reformulation of a supervised binary classification task, and m-mode binary classification environment (MMBCE), designed to provide the agent with the ability to explore the environment using synthetic m-mode imaging. Because of time constraints, MMBCE was only preliminary explored, yet the results indicate that the problem is too complex for the current setup and requires more work before we can draw any conclusions on its feasibility. Experiments show that an RL agent is able to learn to perform phase detection even when the reward signal is very sparse. However, the less sparse reward functions perform better on nearly all metrics. The best agent predicts the correct number of ED and ES events in $\sim 80\%$ of the videos on the test set, on which it yields an aaFD score of 1.69. It is concluded that there are multiple ways of formulating the problem of phase detection as a reinforcement learning problem, but not all formulations perform equally well. Reward sparsity and environment complexity contribute negatively to performance overall. There are also indications that lower values of the epsilon-greedy exploration hyperparameter epsilon have a regularizing effect on the model, prompting further research. |
---|