PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology
Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Pathological diagnosis remains the definitive standard for identifying
tumors. The rise of multimodal large models has simplified the process of
integrating image analysis with textual descriptions. Despite this advancement,
the substantial costs associated with training and deploying these complex
multimodal models, together with a scarcity of high-quality training datasets,
create a significant divide between cutting-edge technology and its application
in the clinical setting. We had meticulously compiled a dataset of
approximately 45,000 cases, covering over 6 different tasks, including the
classification of organ tissues, generating pathology report descriptions, and
addressing pathology-related questions and answers. We have fine-tuned
multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this
dataset to enhance instruction-based performance. We conducted a qualitative
assessment of the capabilities of the base model and the fine-tuned model in
performing image captioning and classification tasks on the specific dataset.
The evaluation results demonstrate that the fine-tuned model exhibits
proficiency in addressing typical pathological questions. We hope that by
making both our models and datasets publicly available, they can be valuable to
the medical and research communities. |
---|---|
DOI: | 10.48550/arxiv.2408.07037 |