Evaluating the Accuracy of Artificial Intelligence (AI)-Generated Illustrations for Laser-Assisted In Situ Keratomileusis (LASIK), Photorefractive Keratectomy (PRK), and Small Incision Lenticule Extraction (SMILE)
To utilize artificial intelligence (AI) platforms to generate medical illustrations for refractive surgeries, aiding patients in visualizing and comprehending procedures like laser-assisted in situ keratomileusis (LASIK), photorefractive keratectomy (PRK), and small incision lenticule extraction (SM...
Gespeichert in:
Veröffentlicht in: | Curēus (Palo Alto, CA) CA), 2024-08, Vol.16 (8), p.e67747 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To utilize artificial intelligence (AI) platforms to generate medical illustrations for refractive surgeries, aiding patients in visualizing and comprehending procedures like laser-assisted in situ keratomileusis (LASIK), photorefractive keratectomy (PRK), and small incision lenticule extraction (SMILE). This study displays the current performance of two OpenAI programs in terms of their accuracy in common corneal refractive procedures.
We selected AI image generators based on their popularity, choosing Decoder-Only Autoregressive Language and Image Synthesis 3 (DALL-E 3) for its leading position and Medical Illustration Master (MiM) for its high engagement. We developed six non-AI-generated prompts targeting specific outcomes related to LASIK, PRK, and SMILE procedures to assess medical accuracy. We generated images using these prompts (18 total images per AI platform) and used the final images produced after the sixth prompt for this study (three final images per AI platform). Human-created procedural images were also gathered for comparison. Four experts independently graded the images, and their scores were averaged. Each image was evaluated with our grading system on "Legibility," "Detail & Clarity," "Anatomical Realism & Accuracy," "Procedural Step Accuracy," and "Lack of Fictitious Anatomy," with scores ranging from 0 to 3 per category allowing 15 points total. A score of 15 points signifies excellent performance, indicating a highly accurate medical illustration. Conversely, a low score suggests a poor-quality illustration. Additionally, we submitted the same AI-generated images back into Chat Generative Pre-Trained Transformer-4o (ChatGPT-4o) along with our grading system. This allowed ChatGPT-4o to use and evaluate both AI-generated and human-created images (HCIs).
In individual category scoring, HCIs significantly outperformed AI images in legibility, anatomical realism, procedural step accuracy, and lack of fictitious anatomy. There were no significant differences between DALL-E 3 and MiM in these categories (p>0.05). In procedure-specific comparisons, HCIs consistently scored higher than AI-generated images for LASIK, PRK, and SMILE. For LASIK, HCIs scored 14 ± 0.82 (93.3%), while DALL-E 3 scored 4.5 ± 0.58 (30%) and MiM scored 4.5 ± 1.91 (30%) (p |
---|---|
ISSN: | 2168-8184 2168-8184 |
DOI: | 10.7759/cureus.67747 |