Analysis of eligibility criteria representation in industry-standard clinical trial protocols

[Display omitted] •We compare textual complexity of full-text and ClinicalTrials.gov (CT) protocols.•We use cosine-similarity measures to identify clusters for standardization.•We find that CT protocols are very condensed and convey lesser information.•Developing a template set is feasible and could...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2013-10, Vol.46 (5), p.805-813
Hauptverfasser: Bhattacharya, Sanmitra, Cantor, Michael N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •We compare textual complexity of full-text and ClinicalTrials.gov (CT) protocols.•We use cosine-similarity measures to identify clusters for standardization.•We find that CT protocols are very condensed and convey lesser information.•Developing a template set is feasible and could lead to efficient criteria design. Previous research on standardization of eligibility criteria and its feasibility has traditionally been conducted on clinical trial protocols from ClinicalTrials.gov (CT). The portability and use of such standardization for full-text industry-standard protocols has not been studied in-depth. Towards this end, in this study we first compare the representation characteristics and textual complexity of a set of Pfizer’s internal full-text protocols to their corresponding entries in CT. Next, we identify clusters of similar criteria sentences from both full-text and CT protocols and outline methods for standardized representation of eligibility criteria. We also study the distribution of eligibility criteria in full-text and CT protocols with respect to pre-defined semantic classes used for eligibility criteria classification. We find that in comparison to full-text protocols, CT protocols are not only more condensed but also convey less information. We also find no correlation between the variations in word-counts of the ClinicalTrials.gov and full-text protocols. While we identify 65 and 103 clusters of inclusion and exclusion criteria from full text protocols, our methods found only 36 and 63 corresponding clusters from CT protocols. For both the full-text and CT protocols we are able to identify ‘templates’ for standardized representations with full-text standardization being more challenging of the two. In our exploration of the semantic class distributions we find that the majority of the inclusion criteria from both full-text and CT protocols belong to the semantic class “Diagnostic and Lab Results” while “Disease, Sign or Symptom” forms the majority for exclusion criteria. Overall, we show that developing a template set of eligibility criteria for clinical trials, specifically in their full-text form, is feasible and could lead to more efficient clinical trial protocol design.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2013.06.001