Reliability of trachoma clinical grading--assessing grading of marginal cases

Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS neglected tropical diseases 2014-05, Vol.8 (5), p.e2840-e2840
Hauptverfasser: Rahman, Salman A, Yu, Sun N, Amza, Abdou, Gebreselassie, Sintayehu, Kadri, Boubacar, Baido, Nassirou, Stoller, Nicole E, Sheehan, Joseph P, Porco, Travis C, Gaynor, Bruce D, Keenan, Jeremy D, Lietman, Thomas M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score. We trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation-follicular (TF) and trachomatous inflammation-intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases. The kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee's grading.
ISSN:1935-2735
1935-2727
1935-2735
DOI:10.1371/journal.pntd.0002840