Transforming Medical Imaging: A VQA Model for Microscopic Blood Cell Classification

Visual Question Answering (VQA) is a promising technology that has the potential to revolutionize the medical field by enabling computers to respond to questions about medical images. VQA holds great potential to transform medical imaging, but several obstacles stand in the way, effective medical VQ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.168547-168556
Hauptverfasser:	Fatima, Izzah, Hussain Shah, Jamal, Saleem, Rabia, Riaz, Samia, Rafiq, Muhammad, Khokhar, Fahad Ahmed
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Biomedical imaging Blood Blood cell images Cells (biology) Computer architecture Computer vision Feature extraction medical Medical diagnostic imaging Microscopy natural language processing Question answering (information retrieval) Transformers visual question answering Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Visual Question Answering (VQA) is a promising technology that has the potential to revolutionize the medical field by enabling computers to respond to questions about medical images. VQA holds great potential to transform medical imaging, but several obstacles stand in the way, effective medical VQA model development is hampered by problems like the scarcity of easily accessible medical datasets, complicated medical scenarios, and the complexity of the medical images. To contribute to the advancement of VQA models in the medical field, our research undertakes two key initiatives. Firstly, we introduce a novel dataset for medical VQA, derived from an existing dataset of blood cell images, Secondly, our study proposes a VQA model that is specifically designed to classify images of microscopic blood cells. We use pre-trained transformers like Electra, BERT, and DistilBERT to extract textual features in combination with Vision Transformer (ViT) to extract visual features from images, and then combine textual and visual features and then apply classifiers like Linear SVM and Quadratic SVM. Experimental results show that The Electra & ViT model surpasses other models by achieving high scores across multiple evaluation metrics, including a WUPS score of 90.09%, accuracy of 89.63%, F1-Score of 64.03%, precision of 63.42%, and recall of 65.23%.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3496655