The additional value of ONEST (Observers Needed to Evaluate Subjective Tests) in assessing reproducibility of oestrogen receptor, progesterone receptor, and Ki67 classification in breast cancer

The reproducibility of assessing potential biomarkers is crucial for their implementation. ONEST (Observers Needed to Evaluate Subjective Tests) has been recently introduced as a new additive evaluation method for the assessment of reliability, by demonstrating how the number of observers impact on...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Virchows Archiv : an international journal of pathology 2021-12, Vol.479 (6), p.1101-1109
Hauptverfasser: Cserni, Bálint, Bori, Rita, Csörgő, Erika, Oláh-Németh, Orsolya, Pancsa, Tamás, Sejben, Anita, Sejben, István, Vörös, András, Zombori, Tamás, Nyári, Tibor, Cserni, Gábor
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The reproducibility of assessing potential biomarkers is crucial for their implementation. ONEST (Observers Needed to Evaluate Subjective Tests) has been recently introduced as a new additive evaluation method for the assessment of reliability, by demonstrating how the number of observers impact on interobserver agreement. Oestrogen receptor (ER), progesterone receptor (PR), and Ki67 proliferation marker immunohistochemical stainings were assessed on 50 core needle biopsy and 50 excision samples from breast cancers by 9 pathologists according to daily practice. ER and PR statuses based on the percentages of stained nuclei were the most consistently assessed parameters (intraclass correlation coefficients, ICC 0.918–0.996), whereas Ki67 with 5 different theoretical or St Gallen Consensus Conference–proposed cut-off values demonstrated moderate to good reproducibility (ICC: 0.625–0.760). ONEST highlighted that consistent tests like ER and PR assessment needed only 2 or 3 observers for optimal evaluation of reproducibility, and the width between plots of the best and worst overall percent agreement values for 100 randomly selected permutations of observers was narrow. In contrast, with less consistently evaluated tests of Ki67 categorization, ONEST suggested at least 5 observers required for more trustful assessment of reliability, and the bandwidth of the best and worst plots was wider (up to 34% difference between two observers). ONEST has additional value to traditional calculations of the interobserver agreement by not only highlighting the number of observers needed to trustfully evaluate reproducibility but also by highlighting the rate of agreement with an increasing number of observers and disagreement between the better and worse ratings.
ISSN:0945-6317
1432-2307
1432-2307
DOI:10.1007/s00428-021-03172-9