Detecting Role Errors in the Gene Hierarchy of the NCI Thesaurus

Hua Min1, Barry Cohen2, Michael Halper3, Marc Oren2 and Yehoshua Perl2 1Fox Chase Cancer Center, Philadelphia, PA 19111-2497, U.S.A. 2Computer Science Dept., NJIT, Newark, NJ 07102-1982, U.S.A. 3Computer Science Dept., Kean University, Union, NJ 07083-0411, U.S.A. Abstract Gene terminologies are pla...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cancer informatics 2008-01, Vol.2008 (1), p.293-313
Hauptverfasser: Min, Hua, Cohen, Barry, Halper, Michael, Oren, Marc, Perl, Yehoshua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Hua Min1, Barry Cohen2, Michael Halper3, Marc Oren2 and Yehoshua Perl2 1Fox Chase Cancer Center, Philadelphia, PA 19111-2497, U.S.A. 2Computer Science Dept., NJIT, Newark, NJ 07102-1982, U.S.A. 3Computer Science Dept., Kean University, Union, NJ 07083-0411, U.S.A. Abstract Gene terminologies are playing an increasingly important role in the ever-growing field of genomic research. While errors in large, complex terminologies are inevitable, gene terminologies are even more susceptible to them due to the rapid growth of genomic knowledge and the nature of its discovery. It is therefore very important to establish quality- assurance protocols for such genomic-knowledge repositories. Different kinds of terminologies oftentimes require auditing methodologies adapted to their particular structures. In light of this, an auditing methodology tailored to the characteristics of the NCI Thesaurus's (NCIT's) Gene hierarchy is presented. The Gene hierarchy is of particular interest to the NCIT's designers due to the primary role of genomics in current cancer research. This multiphase methodology focuses on detecting role-errors, such as missing roles or roles with incorrect or incomplete target structures, occurring within that hierarchy. The methodology is based on two kinds of abstraction networks, called taxonomies, that highlight the role distribution among concepts within the IS-A (subsumption) hierarchy. These abstract views tend to highlight portions of the hierarchy having a higher concentration of errors. The errors found during an application of the methodology are reported. Hypotheses pertaining to the efficacy of our methodology are investigated.
ISSN:1176-9351
1176-9351
DOI:10.4137/CIN.S440