Dual-Input Transformer: An End-to-End Model for Preoperative Assessment of Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Ultrasonography

Neoadjuvant chemotherapy (NAC) is the primary method to reduce the burden of tumor and metastasis; in the treatment of breast cancer, it may provide additional opportunities for breast-conserving surgery. Preoperative assessment of pathological complete response (PCR) to NAC is important for develop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics 2023-01, Vol.27 (1), p.251-262
Hauptverfasser: Tong, Tong, Li, Dongyang, Gu, Jionghui, Chen, Guo, Bai, Guotao, Yang, Xin, Wang, Kun, Jiang, Tianan, Tian, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Neoadjuvant chemotherapy (NAC) is the primary method to reduce the burden of tumor and metastasis; in the treatment of breast cancer, it may provide additional opportunities for breast-conserving surgery. Preoperative assessment of pathological complete response (PCR) to NAC is important for developing individualized treatment approaches and predicting patient prognosis. Compared to magnetic resonance imaging (MRI) and mammography, ultrasonography (US) has the advantages of simplicity, flexibility, and real-time imaging. Moreover, it does not require radiation and can provide multi-time acquisition of the tumor during NAC treatment. Recently, deep learning radiomics models based on multi-time-point US images for the prediction of NAC effectiveness have been proposed. To further improve the prediction performance, we carefully designed four supporting modules for our proposed dual-input transformer (DiT): isolated tokens-to-token patch embedding module, shared position embedding, time embedding, and weighted average pooling feature representation modules. The design of each module considers the characteristics of the US images at multiple time points. We validated our model on our retrospective US dataset composed of 484 cases from two centers whose consistency is not sufficiently high. Patients were allocated to training (n = 297), validation (n = 99), and external test (n = 88) sets. The results show that our model can achieve better performance than the Siamese CNN and the standard tokens-to-token vision transformer without using multi-time-point images. The ablation study also proved the effectiveness of each module designed for DiT.
ISSN:2168-2194
2168-2208
DOI:10.1109/JBHI.2022.3216031