Automated interpretation of congenital heart disease from multi-view echocardiograms
•The video-based multi-view two-dimensional echocardiograms analysis framework.•Automatically organize the five views and diagnose the congenital heart disease.•Powerful baselines to explore the key-frame-based and video-based multi-view diagnose.•A depthwise separable convolution-based efficient mu...
Gespeichert in:
Veröffentlicht in: | Medical image analysis 2021-04, Vol.69, p.101942-101942, Article 101942 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •The video-based multi-view two-dimensional echocardiograms analysis framework.•Automatically organize the five views and diagnose the congenital heart disease.•Powerful baselines to explore the key-frame-based and video-based multi-view diagnose.•A depthwise separable convolution-based efficient multi-channel CNN architecture.•Four video aggregation schemes are developed to process video frames.
[Display omitted]
Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework. We collect the five-view echocardiograms video records of 1308 subjects (including normal controls, ventricular septal defect (VSD) patients and atrial septal defect (ASD) patients) with both disease labels and standard-view key-frame labels. Depthwise separable convolution-based multi-channel networks are adopted to largely reduce the network parameters. We also approach the imbalanced class problem by augmenting the positive training samples. Our 2D key-frame model can diagnose CHD or negative samples with an accuracy of 95.4%, and in negative, VSD or ASD classification with an accuracy of 92.3%. To further alleviate the work of key-frame selection in real-world implementation, we propose an adaptive soft attention scheme to directly explore the raw video data. Four kinds of neural aggregation methods are systematically investigated to fuse the information of an arbitrary number of frames in a video. Moreover, with a view detection module, the system can work without the view records. Our video-based model can diagnose with an accuracy of 93.9% (binary classification), and 92.1% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing. The detailed ablation study and the interpretability analysis are provided.
The presented model has high diagnostic rates for VSD and ASD that can be potentially applied to the clinical practice in the future. The short-term automated machine learning process can partially replace and promote the long-term professional training of primary doctors, improving the primary diagnosis rate of CHD in China, and laying the foundatio |
---|---|
ISSN: | 1361-8415 1361-8423 |
DOI: | 10.1016/j.media.2020.101942 |