Imperfect gold standard gene sets yield inaccurate evaluation of causal gene identification methods

Causal gene discovery methods are often evaluated using reference sets of causal genes, which are treated as gold standards (GS) for the purposes of evaluation. However, evaluation methods typically treat genes not in the GS positive set as known negatives rather than unknowns. This leads to inaccur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Communications biology 2024-07, Vol.7 (1), p.873-5, Article 873
Hauptverfasser: Wang, Lijia, Wen, Xiaoquan, Morrison, Jean
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Causal gene discovery methods are often evaluated using reference sets of causal genes, which are treated as gold standards (GS) for the purposes of evaluation. However, evaluation methods typically treat genes not in the GS positive set as known negatives rather than unknowns. This leads to inaccurate estimates of sensitivity, specificity, and AUC. Labeling biases in GS gene sets can also lead to inaccurate ordering of alternative causal gene discovery methods. We argue that the evaluation of causal gene discovery methods should rely on statistical techniques like those used for variant discovery rather than on comparison with GS gene sets. This perspective highlights the limitations of empirically evaluating causal gene discovery methods in the absence of completely labeled reference gene sets. It shows that sensitivity, specificity, and AUC may be critically biased, and advocate for increased reliance on probabilistic modeling.
ISSN:2399-3642
2399-3642
DOI:10.1038/s42003-024-06482-1