MMDS: A Multimodal Medical Diagnosis System Integrating Image Analysis and Knowledge-based Departmental Consultation
We present MMDS, a system capable of recognizing medical images and patient facial details, and providing professional medical diagnoses. The system consists of two core components:The first component is the analysis of medical images and videos. We trained a specialized multimodal medical model cap...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present MMDS, a system capable of recognizing medical images and patient
facial details, and providing professional medical diagnoses. The system
consists of two core components:The first component is the analysis of medical
images and videos. We trained a specialized multimodal medical model capable of
interpreting medical images and accurately analyzing patients' facial emotions
and facial paralysis conditions. The model achieved an accuracy of 72.59% on
the FER2013 facial emotion recognition dataset, with a 91.1% accuracy in
recognizing the "happy" emotion. In facial paralysis recognition, the model
reached an accuracy of 92%, which is 30% higher than that of GPT-4o. Based on
this model, we developed a parser for analyzing facial movement videos of
patients with facial paralysis, achieving precise grading of the paralysis
severity. In tests on 30 videos of facial paralysis patients, the system
demonstrated a grading accuracy of 83.3%.The second component is the generation
of professional medical responses. We employed a large language model,
integrated with a medical knowledge base, to generate professional diagnoses
based on the analysis of medical images or videos. The core innovation lies in
our development of a department-specific knowledge base routing management
mechanism, in which the large language model categorizes data by medical
departments and, during the retrieval process, determines the appropriate
knowledge base to query. This significantly improves retrieval accuracy in the
RAG (retrieval-augmented generation) process. |
---|---|
DOI: | 10.48550/arxiv.2410.15403 |