Contrasting lexical similarity and formal definitions in SNOMED CT: Consistency and implications

•SNOMED CT is a large and complex terminology and imperfections are inevitable.•We propose an algorithmic quality assurance method to find inconsistent concepts.•We formulate similarity sets: groups of concepts with similar lexical structure.•Five different set types are formed and a sample of each...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2014-02, Vol.47, p.192-198
Hauptverfasser: Agrawal, Ankur, Elhanan, Gai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•SNOMED CT is a large and complex terminology and imperfections are inevitable.•We propose an algorithmic quality assurance method to find inconsistent concepts.•We formulate similarity sets: groups of concepts with similar lexical structure.•Five different set types are formed and a sample of each is evaluated.•Evaluation of the samples revealed 38–70% of the sets having inconsistencies. To quantify the presence of and evaluate an approach for detection of inconsistencies in the formal definitions of SNOMED CT (SCT) concepts utilizing a lexical method. Utilizing SCT’s Procedure hierarchy, we algorithmically formulated similarity sets: groups of concepts with similar lexical structure of their fully specified name. We formulated five random samples, each with 50 similarity sets, based on the same parameter: number of parents, attributes, groups, all the former as well as a randomly selected control sample. All samples’ sets were reviewed for types of formal definition inconsistencies: hierarchical, attribute assignment, attribute target values, groups, and definitional. For the Procedure hierarchy, 2111 similarity sets were formulated, covering 18.1% of eligible concepts. The evaluation revealed that 38 (Control) to 70% (Different relationships) of similarity sets within the samples exhibited significant inconsistencies. The rate of inconsistencies for the sample with different relationships was highly significant compared to Control, as well as the number of attribute assignment and hierarchical inconsistencies within their respective samples. While, at this time of the HITECH initiative, the formal definitions of SCT are only a minor consideration, in the grand scheme of sophisticated, meaningful use of captured clinical data, they are essential. However, significant portion of the concepts in the most semantically complex hierarchy of SCT, the Procedure hierarchy, are modeled inconsistently in a manner that affects their computability. Lexical methods can efficiently identify such inconsistencies and possibly allow for their algorithmic resolution.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2013.11.003