BT15 Development of a validated image grading scale to assess the quality of skin lesion images
Image quality is important for diagnostic confidence. For teledermatologists, low-quality images cause greater uncertainty, necessitating more cautious decision-making. Likewise, artificial intelligence systems are trained to recognize low-quality images, and need to adjust margin of errors to accou...
Gespeichert in:
Veröffentlicht in: | British journal of dermatology (1951) 2023-06, Vol.188 (Supplement_4) |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image quality is important for diagnostic confidence. For teledermatologists, low-quality images cause greater uncertainty, necessitating more cautious decision-making. Likewise, artificial intelligence systems are trained to recognize low-quality images, and need to adjust margin of errors to account for increased uncertainty, to avoid costly mistakes such as missed cancer diagnoses. There is, at present, no publicly available, validated tool for skin lesion image quality. We describe our steps in developing a grading scale to assess image quality, based on real-world images from an existing teledermatology service. We conducted a literature search for evidence-based data in the field. We formulated items to evaluate directly our measurement of interest, through multiple focus group meetings with dermatologists and clinical photographers. Three key variables were identified: ‘focus’, ‘lighting’ and ‘composition’. All variables were interdependent, with ‘focus’ being most essential. We devised a 4-point grading scale, whereby images are visually assessed and assigned a score between grade 1 (low) and grade 4 (very high). Next, we tested on two cohorts: four dermatologists (clinical experts) and two clinical photographers (technical experts). A sample of 35 anonymized images from teledermatology and clinical photography (CP) evenly distributed across the scale was used. To reduce variability in user perception, participants received 10 min of presurvey training, followed by online testing under standardized settings. Inter-rater reliability (IRR) or user agreement was analysed using Fleiss’ kappa calculations. The all-user IRR showed moderate agreement (IRR 0.50), with modest improvement at in-group analysis [clinical photographers (IRR 0.58) and dermatologists (IRR 0.52)]. To estimate the IRR for each grade, we calculated the percentage agreement against a provisional benchmark. All-user percentage agreement was 67%, with the highest agreement for grade 1 (low, 83%) and lowest for grade 3 (high, 53%). Feedback on content relevance, usability and form was obtained, guiding further scale modification. We appraised scale discrimination through testing on a representative sample of real-world images from 200 consecutive referrals. Seventy-eight per cent of referrals had primary care-acquired images, and 21% of referrals had CP. On average, primary care-acquired images were graded lower [2.17, 95% confidence interval (CI) 2.04–2.30] than CP (3.89, 95% CI 3.77–4 |
---|---|
ISSN: | 0007-0963 1365-2133 |
DOI: | 10.1093/bjd/ljad113.381 |