EchoPhaseFormer: A Transformer Based Echo Phase Detection and Analysis in 2D Echocardiography
The accurate cardiac function analysis (i.e., ventricle/stroke volume and ejection fraction measurement) in 2D echocardiography is challenging because of the low-resolution nature of echo sequence and motion in cardiac structure. In an echo sequence, the cardiac function analysis is a sequential pro...
Gespeichert in:
Veröffentlicht in: | SN computer science 2024-09, Vol.5 (7), p.878, Article 878 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The accurate cardiac function analysis (i.e., ventricle/stroke volume and ejection fraction measurement) in 2D echocardiography is challenging because of the low-resolution nature of echo sequence and motion in cardiac structure. In an echo sequence, the cardiac function analysis is a sequential process: identification of end-diastole (ED) and end-systole (ES) frames (echo phase detection) followed by the left ventricle ejection fraction (LVEF) prediction. To precisely describe cardiac function, proper attention must be given to spatial and temporal information and their interaction. Several deep learning (i.e., convolution neural networks, recurrent neural networks, and transformer) techniques have recently been introduced but have largely ignored the spatial and temporal information interaction. To address this issue, this study introduces EchoPhaseFormer, a transformer-based solution for echo phase detection (EPD) and LVEF prediction. A 3D convolution stemming is used to get the 3D patches from the echo sequence to retain the temporal information. The EchoPhaseFormer has an echo phase former block consisting of a conditional positional encoder and a phase self-attention module that ensures the spatial–temporal information extraction and their interaction. The EchoPhaseFormer outperformed the state-of-the-art architectures for both tasks on the EchoNet dataset. We obtain an average absolute frame distance of 1.01 for ED frames and 1.04 for ES frames for EPD, respectively. Regarding LVEF prediction, we obtain a mean absolute error of 4.77, a root mean square error of 6.14, and an R2 score of 0.81. |
---|---|
ISSN: | 2661-8907 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-024-03249-7 |