Empirical validation of an automated approach to data use oversight

The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cell genomics 2021-11, Vol.1 (2), p.100031-100031, Article 100031
Hauptverfasser: Cabili, Moran N., Lawson, Jonathan, Saltzman, Andrea, Rushton, Greg, O’Rourke, Pearl, Wilbanks, John, Rodriguez, Laura Lyman, Nyronen, Tommi, Courtot, Mélanie, Donnelly, Stacey, Philippakis, Anthony A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute’s data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS’ automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance. [Display omitted] Genomic data sharing is overly complex, resulting in significant delays that slow scienceDUO is a new standardized vocabulary that structures data use restrictionsThe DUOS platform uses DUO to expedite data access requests for researchers and reviewersWe provide empirical evidence that data governance can be automated in most cases The GA4GH Data Use Ontology (DUO) is a structured approach to describing the secondary uses of data. Cabili et al. report the Data Use Oversight System (DUOS), an open-source software platform that leverages DUO and other data-sharing policy advancements as a first step toward automating data governance. They provide empirical evidence that DUOS performs comparable to a human data access committee in adjudication of data access requests, streamlining the process of gaining access to human biomedical data.
ISSN:2666-979X
2666-979X
DOI:10.1016/j.xgen.2021.100031