MuSiC-ViT: A multi-task Siamese convolutional vision transformer for differentiating change from no-change in follow-up chest radiographs

•MuSiC-ViT mimics a radiologist's unconscious clinical screening process.•Integrating the AMM with MuSiC-ViT forces the transformer to pay attention to similar regions in the baseline and follow-up CXR pairs.•The training of MuSiC-ViT used a large-scale high-resolution CXR dataset (88 K change...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Medical image analysis 2023-10, Vol.89, p.102894-102894, Article 102894
Hauptverfasser:	Cho, Kyungjin, Kim, Jeeyoung, Kim, Ki Duk, Park, Seungju, Kim, Junsik, Yun, Jihye, Ahn, Yura, Oh, Sang Young, Lee, Sang Min, Seo, Joon Beom, Kim, Namkug
Format:	Artikel
Sprache:	eng
Schlagworte:	CNNs meet vision transformers Follow-up chest radiographs Multi-task learning Siamese network Vision transformer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•MuSiC-ViT mimics a radiologist's unconscious clinical screening process.•Integrating the AMM with MuSiC-ViT forces the transformer to pay attention to similar regions in the baseline and follow-up CXR pairs.•The training of MuSiC-ViT used a large-scale high-resolution CXR dataset (88 K change and 115 K no-change pairs of 512 × 512 pixels) and was validated using one internal dataset and two validation datasets.•MuSiC-ViT can classify change/no-change for lung diseases.•A subset of the chexpert dataset labeled by a radiologist for change/no-change is provided. A major responsibility of radiologists in routine clinical practice is to read follow-up chest radiographs (CXRs) to identify changes in a patient's condition. Diagnosing meaningful changes in follow-up CXRs is challenging because radiologists must differentiate disease changes from natural or benign variations. Here, we suggest using a multi-task Siamese convolutional vision transformer (MuSiC-ViT) with an anatomy-matching module (AMM) to mimic the radiologist's cognitive process for differentiating baseline change from no-change. MuSiC-ViT uses the convolutional neural networks (CNNs) meet vision transformers model that combines CNN and transformer architecture. It has three major components: a Siamese network architecture, an AMM, and multi-task learning. Because the input is a pair of CXRs, a Siamese network was adopted for the encoder. The AMM is an attention module that focuses on related regions in the CXR pairs. To mimic a radiologist's cognitive process, MuSiC-ViT was trained using multi-task learning, normal/abnormal and change/no-change classification, and anatomy-matching. Among 406 K CXRs studied, 88 K change and 115 K no-change pairs were acquired for the training dataset. The internal validation dataset consisted of 1,620 pairs. To demonstrate the robustness of MuSiC-ViT, we verified the results with two other validation datasets. MuSiC-ViT respectively achieved accuracies and area under the receiver operating characteristic curves of 0.728 and 0.797 on the internal validation dataset, 0.614 and 0.784 on the first external validation dataset, and 0.745 and 0.858 on a second temporally separated validation dataset. All code is available at https://github.com/chokyungjin/MuSiC-ViT. [Display omitted]
ISSN:	1361-8415 1361-8423
DOI:	10.1016/j.media.2023.102894