Cross-modal interaction and multi-source visual fusion for video generation in fetal cardiac screening

To address the limitation of preserving data for dynamic visualization in fetal ultrasound screening, a novel framework is proposed to facilitate the generation of fetal four-chamber echocardiogram videos, incorporating multi-source visual fusion and understanding. The framework utilizes an effectiv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information fusion 2024-11, Vol.111, p.102510, Article 102510
Hauptverfasser: Zhu, Guosong, Deng, Erqiang, Qin, Zhen, Khan, Fazlullah, Wei, Wei, Srivastava, Gautam, Xiong, Hu, Kumari, Saru
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To address the limitation of preserving data for dynamic visualization in fetal ultrasound screening, a novel framework is proposed to facilitate the generation of fetal four-chamber echocardiogram videos, incorporating multi-source visual fusion and understanding. The framework utilizes an effective spectrogram-ultrasound synchronizer to align the ultrasound images with time, ensuring the generated video matches the actual heartbeat rhythm. It further employs effective frame interpolation techniques to synthesize a video by incorporating a nonlinear bidirectional motion prediction. By integrating a Transformer model for the autoregressive generation of visual semantic sequence, the proposed framework demonstrates its capability to generate high-resolution frames. Experimental outcomes show the Clip-Similarity of 96.23% and DINOv2-Similarity of 99.77%. Furthermore, a multimodal dataset of fetal echocardiogram examinations has been constructed. [Display omitted] •A frame interpolation is used to synthesis videos via inter-frame pixel movement.•A synchronizer is developed to regulate videos with actual cardiac rhythm fusion.•The synchronizer aligns images with blood flow to synchronize the cardiac rhythm.•A novel framework is proposed for ECG signals with multi-source visual fusion.•A multimodal dataset of four-chamber ECG is constructed with over 2000 cases.
ISSN:1566-2535
1872-6305
DOI:10.1016/j.inffus.2024.102510