The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data

The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was his...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Gigascience 2018-07, Vol.7 (7)
Hauptverfasser: Nind, Thomas, Galloway, James, McAllister, Gordon, Scobbie, Donald, Bonney, Wilfred, Hall, Christopher, Tramma, Leandro, Reel, Parminder, Groves, Martin, Appleby, Philip, Doney, Alex, Guthrie, Bruce, Jefferson, Emily
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 7
container_start_page
container_title Gigascience
container_volume 7
creator Nind, Thomas
Galloway, James
McAllister, Gordon
Scobbie, Donald
Bonney, Wilfred
Hall, Christopher
Tramma, Leandro
Reel, Parminder
Groves, Martin
Appleby, Philip
Doney, Alex
Guthrie, Bruce
Jefferson, Emily
description The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets. The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.
doi_str_mv 10.1093/gigascience/giy060
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6041881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2043173510</sourcerecordid><originalsourceid>FETCH-LOGICAL-c458t-584e069ca96269e773b4cb75a79e95f3b3ba70d92709b4414b1d061c3c57890e3</originalsourceid><addsrcrecordid>eNqNUk1rFTEUDaLYUvsHXEjATYWOJpPMJHEhlPoJFUUquAuZzJ15KZlkTGYe9Hf4h83j1fJ0ZTa53JxzOPfmIPSUkpeUKPZqdKPJ1kGwUOpb0pIH6LgmXFQ1FT8eHtRH6DTnG1KOEFIK9hgd1UooohpyjH5dbwAnyGCS3eDeLAZPJpgRJggLnr1ZhpgmfPbt7eevL17jCxziFvw5nlO0kDPuk9tCOMdxhlDluCYLeInR40LDS9E-UIsD9jGMbll7F4zHNm5iWvKub70LzpbezsET9GgwPsPp3X2Cvr9_d335sbr68uHT5cVVZXkjl6qRHEirrFFt3SoQgnXcdqIxQoFqBtaxzgjSq1oQ1XFOeUd70lLLbCOkIsBO0Ju97rx2E_S2eEzG6zm5yaRbHY3Tf78Et9Fj3OqWcColLQJndwIp_lwhL3py2YL3JkBcsy5fwKhgDSUF-vwf6E1ZVtlCQQnaSCIU_x9ULWVB1XuUTTHnBMO9ZUr0Lhz6IBx6H45CenY47D3lTxTYb2wyulc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2715807288</pqid></control><display><type>article</type><title>The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data</title><source>MEDLINE</source><source>Access via Oxford University Press (Open Access Collection)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Nind, Thomas ; Galloway, James ; McAllister, Gordon ; Scobbie, Donald ; Bonney, Wilfred ; Hall, Christopher ; Tramma, Leandro ; Reel, Parminder ; Groves, Martin ; Appleby, Philip ; Doney, Alex ; Guthrie, Bruce ; Jefferson, Emily</creator><creatorcontrib>Nind, Thomas ; Galloway, James ; McAllister, Gordon ; Scobbie, Donald ; Bonney, Wilfred ; Hall, Christopher ; Tramma, Leandro ; Reel, Parminder ; Groves, Martin ; Appleby, Philip ; Doney, Alex ; Guthrie, Bruce ; Jefferson, Emily</creatorcontrib><description>The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets. The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.</description><identifier>ISSN: 2047-217X</identifier><identifier>EISSN: 2047-217X</identifier><identifier>DOI: 10.1093/gigascience/giy060</identifier><identifier>PMID: 29790950</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Cleaning ; Computer Systems ; Data management ; Databases, Factual ; Datasets ; Error reduction ; Health care ; Humans ; Informatics ; Internet ; Longitudinal Studies ; Medical Informatics - methods ; Open source software ; Programming Languages ; Quality assessment ; Quality Control ; Reproducibility of Results ; Research data management ; Researchers ; Scotland ; Software ; Technical Note ; Universities</subject><ispartof>Gigascience, 2018-07, Vol.7 (7)</ispartof><rights>The Author(s) 2018. Published by Oxford University Press.</rights><rights>The Author(s) 2018. Published by Oxford University Press. 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c458t-584e069ca96269e773b4cb75a79e95f3b3ba70d92709b4414b1d061c3c57890e3</citedby><cites>FETCH-LOGICAL-c458t-584e069ca96269e773b4cb75a79e95f3b3ba70d92709b4414b1d061c3c57890e3</cites><orcidid>0000-0003-4191-4880 ; 0000-0003-2992-7582</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041881/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041881/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29790950$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nind, Thomas</creatorcontrib><creatorcontrib>Galloway, James</creatorcontrib><creatorcontrib>McAllister, Gordon</creatorcontrib><creatorcontrib>Scobbie, Donald</creatorcontrib><creatorcontrib>Bonney, Wilfred</creatorcontrib><creatorcontrib>Hall, Christopher</creatorcontrib><creatorcontrib>Tramma, Leandro</creatorcontrib><creatorcontrib>Reel, Parminder</creatorcontrib><creatorcontrib>Groves, Martin</creatorcontrib><creatorcontrib>Appleby, Philip</creatorcontrib><creatorcontrib>Doney, Alex</creatorcontrib><creatorcontrib>Guthrie, Bruce</creatorcontrib><creatorcontrib>Jefferson, Emily</creatorcontrib><title>The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data</title><title>Gigascience</title><addtitle>Gigascience</addtitle><description>The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets. The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.</description><subject>Cleaning</subject><subject>Computer Systems</subject><subject>Data management</subject><subject>Databases, Factual</subject><subject>Datasets</subject><subject>Error reduction</subject><subject>Health care</subject><subject>Humans</subject><subject>Informatics</subject><subject>Internet</subject><subject>Longitudinal Studies</subject><subject>Medical Informatics - methods</subject><subject>Open source software</subject><subject>Programming Languages</subject><subject>Quality assessment</subject><subject>Quality Control</subject><subject>Reproducibility of Results</subject><subject>Research data management</subject><subject>Researchers</subject><subject>Scotland</subject><subject>Software</subject><subject>Technical Note</subject><subject>Universities</subject><issn>2047-217X</issn><issn>2047-217X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNUk1rFTEUDaLYUvsHXEjATYWOJpPMJHEhlPoJFUUquAuZzJ15KZlkTGYe9Hf4h83j1fJ0ZTa53JxzOPfmIPSUkpeUKPZqdKPJ1kGwUOpb0pIH6LgmXFQ1FT8eHtRH6DTnG1KOEFIK9hgd1UooohpyjH5dbwAnyGCS3eDeLAZPJpgRJggLnr1ZhpgmfPbt7eevL17jCxziFvw5nlO0kDPuk9tCOMdxhlDluCYLeInR40LDS9E-UIsD9jGMbll7F4zHNm5iWvKub70LzpbezsET9GgwPsPp3X2Cvr9_d335sbr68uHT5cVVZXkjl6qRHEirrFFt3SoQgnXcdqIxQoFqBtaxzgjSq1oQ1XFOeUd70lLLbCOkIsBO0Ju97rx2E_S2eEzG6zm5yaRbHY3Tf78Et9Fj3OqWcColLQJndwIp_lwhL3py2YL3JkBcsy5fwKhgDSUF-vwf6E1ZVtlCQQnaSCIU_x9ULWVB1XuUTTHnBMO9ZUr0Lhz6IBx6H45CenY47D3lTxTYb2wyulc</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Nind, Thomas</creator><creator>Galloway, James</creator><creator>McAllister, Gordon</creator><creator>Scobbie, Donald</creator><creator>Bonney, Wilfred</creator><creator>Hall, Christopher</creator><creator>Tramma, Leandro</creator><creator>Reel, Parminder</creator><creator>Groves, Martin</creator><creator>Appleby, Philip</creator><creator>Doney, Alex</creator><creator>Guthrie, Bruce</creator><creator>Jefferson, Emily</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4191-4880</orcidid><orcidid>https://orcid.org/0000-0003-2992-7582</orcidid></search><sort><creationdate>20180701</creationdate><title>The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data</title><author>Nind, Thomas ; Galloway, James ; McAllister, Gordon ; Scobbie, Donald ; Bonney, Wilfred ; Hall, Christopher ; Tramma, Leandro ; Reel, Parminder ; Groves, Martin ; Appleby, Philip ; Doney, Alex ; Guthrie, Bruce ; Jefferson, Emily</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c458t-584e069ca96269e773b4cb75a79e95f3b3ba70d92709b4414b1d061c3c57890e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Cleaning</topic><topic>Computer Systems</topic><topic>Data management</topic><topic>Databases, Factual</topic><topic>Datasets</topic><topic>Error reduction</topic><topic>Health care</topic><topic>Humans</topic><topic>Informatics</topic><topic>Internet</topic><topic>Longitudinal Studies</topic><topic>Medical Informatics - methods</topic><topic>Open source software</topic><topic>Programming Languages</topic><topic>Quality assessment</topic><topic>Quality Control</topic><topic>Reproducibility of Results</topic><topic>Research data management</topic><topic>Researchers</topic><topic>Scotland</topic><topic>Software</topic><topic>Technical Note</topic><topic>Universities</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nind, Thomas</creatorcontrib><creatorcontrib>Galloway, James</creatorcontrib><creatorcontrib>McAllister, Gordon</creatorcontrib><creatorcontrib>Scobbie, Donald</creatorcontrib><creatorcontrib>Bonney, Wilfred</creatorcontrib><creatorcontrib>Hall, Christopher</creatorcontrib><creatorcontrib>Tramma, Leandro</creatorcontrib><creatorcontrib>Reel, Parminder</creatorcontrib><creatorcontrib>Groves, Martin</creatorcontrib><creatorcontrib>Appleby, Philip</creatorcontrib><creatorcontrib>Doney, Alex</creatorcontrib><creatorcontrib>Guthrie, Bruce</creatorcontrib><creatorcontrib>Jefferson, Emily</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Gigascience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nind, Thomas</au><au>Galloway, James</au><au>McAllister, Gordon</au><au>Scobbie, Donald</au><au>Bonney, Wilfred</au><au>Hall, Christopher</au><au>Tramma, Leandro</au><au>Reel, Parminder</au><au>Groves, Martin</au><au>Appleby, Philip</au><au>Doney, Alex</au><au>Guthrie, Bruce</au><au>Jefferson, Emily</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data</atitle><jtitle>Gigascience</jtitle><addtitle>Gigascience</addtitle><date>2018-07-01</date><risdate>2018</risdate><volume>7</volume><issue>7</issue><issn>2047-217X</issn><eissn>2047-217X</eissn><abstract>The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets. The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>29790950</pmid><doi>10.1093/gigascience/giy060</doi><orcidid>https://orcid.org/0000-0003-4191-4880</orcidid><orcidid>https://orcid.org/0000-0003-2992-7582</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2047-217X
ispartof Gigascience, 2018-07, Vol.7 (7)
issn 2047-217X
2047-217X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6041881
source MEDLINE; Access via Oxford University Press (Open Access Collection); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Cleaning
Computer Systems
Data management
Databases, Factual
Datasets
Error reduction
Health care
Humans
Informatics
Internet
Longitudinal Studies
Medical Informatics - methods
Open source software
Programming Languages
Quality assessment
Quality Control
Reproducibility of Results
Research data management
Researchers
Scotland
Software
Technical Note
Universities
title The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T03%3A29%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20research%20data%20management%20platform%20(RDMP):%20A%20novel,%20process%20driven,%20open-source%20tool%20for%20the%20management%20of%20longitudinal%20cohorts%20of%20clinical%20data&rft.jtitle=Gigascience&rft.au=Nind,%20Thomas&rft.date=2018-07-01&rft.volume=7&rft.issue=7&rft.issn=2047-217X&rft.eissn=2047-217X&rft_id=info:doi/10.1093/gigascience/giy060&rft_dat=%3Cproquest_pubme%3E2043173510%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2715807288&rft_id=info:pmid/29790950&rfr_iscdi=true