Chia, a large annotated corpus of clinical trial eligibility criteria
We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types...
Gespeichert in:
Veröffentlicht in: | Scientific data 2020-08, Vol.7 (1), p.281-281, Article 281 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 281 |
---|---|
container_issue | 1 |
container_start_page | 281 |
container_title | Scientific data |
container_volume | 7 |
creator | Kury, Fabrício Butler, Alex Yuan, Chi Fu, Li-heng Sun, Yingcheng Liu, Hao Sim, Ida Carini, Simona Weng, Chunhua |
description | We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.
Measurement(s)
Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy
Technology Type(s)
digital curation • computational modeling technique
Sample Characteristic - Organism
Homo sapiens
Machine-accessible metadata file describing the reported data:
https://doi.org/10.6084/m9.figshare.12765602 |
doi_str_mv | 10.1038/s41597-020-00620-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7452886</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2437639048</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-b2e05d5d227184fe80435d617d6d5ef6d9dcdedb9ab343a9ba907aefab0179143</originalsourceid><addsrcrecordid>eNp9kU1r3DAQhkVpaUKaP9BDMfTSQ52Mviz5UihLviCQS3sWsjTeKGitrWQX8u-rzSZp2kMuM0Lz6B3NvIR8pHBCgevTIqjsVQsMWoBuF9-QQwaStUJ0_O2L8wE5LuUOACgXIBW8JwecaSkF6ENytroN9mtjm2jzGhs7TWm2M_rGpbxdSpPGxsUwBWdjM-dQI8awDkOIYb5vXA4z1tsP5N1oY8Hjx3xEfp6f_Vhdttc3F1er79etE0rM7cAQpJeeMUW1GFGD4NJ3VPnOSxw733vn0Q-9Hbjgth9sD8riaAegqqeCH5Fve93tMmzQO5zmbKPZ5rCx-d4kG8y_lSncmnX6bZSQTOuuCnx5FMjp14JlNptQHMZoJ0xLMUxw3WlRl1XRz_-hd2nJUx1vR6mO9yB0pdiecjmVknF8_gwFszPK7I0y1SjzYJTZSX96OcbzkydbKsD3QKmlaY35b-9XZP8ArYeeWA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2437639048</pqid></control><display><type>article</type><title>Chia, a large annotated corpus of clinical trial eligibility criteria</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>Nature Free</source><source>PubMed Central</source><creator>Kury, Fabrício ; Butler, Alex ; Yuan, Chi ; Fu, Li-heng ; Sun, Yingcheng ; Liu, Hao ; Sim, Ida ; Carini, Simona ; Weng, Chunhua</creator><creatorcontrib>Kury, Fabrício ; Butler, Alex ; Yuan, Chi ; Fu, Li-heng ; Sun, Yingcheng ; Liu, Hao ; Sim, Ida ; Carini, Simona ; Weng, Chunhua</creatorcontrib><description>We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.
Measurement(s)
Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy
Technology Type(s)
digital curation • computational modeling technique
Sample Characteristic - Organism
Homo sapiens
Machine-accessible metadata file describing the reported data:
https://doi.org/10.6084/m9.figshare.12765602</description><identifier>ISSN: 2052-4463</identifier><identifier>EISSN: 2052-4463</identifier><identifier>DOI: 10.1038/s41597-020-00620-0</identifier><identifier>PMID: 32855408</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>692/308/2779/109 ; Clinical trials ; Clinical Trials, Phase IV as Topic ; Computer applications ; Data Descriptor ; Digital curation ; Humanities and Social Sciences ; Humans ; Learning algorithms ; Machine learning ; multidisciplinary ; Science ; Science (multidisciplinary)</subject><ispartof>Scientific data, 2020-08, Vol.7 (1), p.281-281, Article 281</ispartof><rights>The Author(s) 2020</rights><rights>The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-b2e05d5d227184fe80435d617d6d5ef6d9dcdedb9ab343a9ba907aefab0179143</citedby><cites>FETCH-LOGICAL-c474t-b2e05d5d227184fe80435d617d6d5ef6d9dcdedb9ab343a9ba907aefab0179143</cites><orcidid>0000-0002-1045-8459</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7452886/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,41119,42188,51575,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32855408$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kury, Fabrício</creatorcontrib><creatorcontrib>Butler, Alex</creatorcontrib><creatorcontrib>Yuan, Chi</creatorcontrib><creatorcontrib>Fu, Li-heng</creatorcontrib><creatorcontrib>Sun, Yingcheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Sim, Ida</creatorcontrib><creatorcontrib>Carini, Simona</creatorcontrib><creatorcontrib>Weng, Chunhua</creatorcontrib><title>Chia, a large annotated corpus of clinical trial eligibility criteria</title><title>Scientific data</title><addtitle>Sci Data</addtitle><addtitle>Sci Data</addtitle><description>We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.
Measurement(s)
Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy
Technology Type(s)
digital curation • computational modeling technique
Sample Characteristic - Organism
Homo sapiens
Machine-accessible metadata file describing the reported data:
https://doi.org/10.6084/m9.figshare.12765602</description><subject>692/308/2779/109</subject><subject>Clinical trials</subject><subject>Clinical Trials, Phase IV as Topic</subject><subject>Computer applications</subject><subject>Data Descriptor</subject><subject>Digital curation</subject><subject>Humanities and Social Sciences</subject><subject>Humans</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>multidisciplinary</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><issn>2052-4463</issn><issn>2052-4463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU1r3DAQhkVpaUKaP9BDMfTSQ52Mviz5UihLviCQS3sWsjTeKGitrWQX8u-rzSZp2kMuM0Lz6B3NvIR8pHBCgevTIqjsVQsMWoBuF9-QQwaStUJ0_O2L8wE5LuUOACgXIBW8JwecaSkF6ENytroN9mtjm2jzGhs7TWm2M_rGpbxdSpPGxsUwBWdjM-dQI8awDkOIYb5vXA4z1tsP5N1oY8Hjx3xEfp6f_Vhdttc3F1er79etE0rM7cAQpJeeMUW1GFGD4NJ3VPnOSxw733vn0Q-9Hbjgth9sD8riaAegqqeCH5Fve93tMmzQO5zmbKPZ5rCx-d4kG8y_lSncmnX6bZSQTOuuCnx5FMjp14JlNptQHMZoJ0xLMUxw3WlRl1XRz_-hd2nJUx1vR6mO9yB0pdiecjmVknF8_gwFszPK7I0y1SjzYJTZSX96OcbzkydbKsD3QKmlaY35b-9XZP8ArYeeWA</recordid><startdate>20200827</startdate><enddate>20200827</enddate><creator>Kury, Fabrício</creator><creator>Butler, Alex</creator><creator>Yuan, Chi</creator><creator>Fu, Li-heng</creator><creator>Sun, Yingcheng</creator><creator>Liu, Hao</creator><creator>Sim, Ida</creator><creator>Carini, Simona</creator><creator>Weng, Chunhua</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-1045-8459</orcidid></search><sort><creationdate>20200827</creationdate><title>Chia, a large annotated corpus of clinical trial eligibility criteria</title><author>Kury, Fabrício ; Butler, Alex ; Yuan, Chi ; Fu, Li-heng ; Sun, Yingcheng ; Liu, Hao ; Sim, Ida ; Carini, Simona ; Weng, Chunhua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-b2e05d5d227184fe80435d617d6d5ef6d9dcdedb9ab343a9ba907aefab0179143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>692/308/2779/109</topic><topic>Clinical trials</topic><topic>Clinical Trials, Phase IV as Topic</topic><topic>Computer applications</topic><topic>Data Descriptor</topic><topic>Digital curation</topic><topic>Humanities and Social Sciences</topic><topic>Humans</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>multidisciplinary</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kury, Fabrício</creatorcontrib><creatorcontrib>Butler, Alex</creatorcontrib><creatorcontrib>Yuan, Chi</creatorcontrib><creatorcontrib>Fu, Li-heng</creatorcontrib><creatorcontrib>Sun, Yingcheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Sim, Ida</creatorcontrib><creatorcontrib>Carini, Simona</creatorcontrib><creatorcontrib>Weng, Chunhua</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Scientific data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kury, Fabrício</au><au>Butler, Alex</au><au>Yuan, Chi</au><au>Fu, Li-heng</au><au>Sun, Yingcheng</au><au>Liu, Hao</au><au>Sim, Ida</au><au>Carini, Simona</au><au>Weng, Chunhua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Chia, a large annotated corpus of clinical trial eligibility criteria</atitle><jtitle>Scientific data</jtitle><stitle>Sci Data</stitle><addtitle>Sci Data</addtitle><date>2020-08-27</date><risdate>2020</risdate><volume>7</volume><issue>1</issue><spage>281</spage><epage>281</epage><pages>281-281</pages><artnum>281</artnum><issn>2052-4463</issn><eissn>2052-4463</eissn><abstract>We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.
Measurement(s)
Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy
Technology Type(s)
digital curation • computational modeling technique
Sample Characteristic - Organism
Homo sapiens
Machine-accessible metadata file describing the reported data:
https://doi.org/10.6084/m9.figshare.12765602</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>32855408</pmid><doi>10.1038/s41597-020-00620-0</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-1045-8459</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2052-4463 |
ispartof | Scientific data, 2020-08, Vol.7 (1), p.281-281, Article 281 |
issn | 2052-4463 2052-4463 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7452886 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Springer Nature OA Free Journals; Nature Free; PubMed Central |
subjects | 692/308/2779/109 Clinical trials Clinical Trials, Phase IV as Topic Computer applications Data Descriptor Digital curation Humanities and Social Sciences Humans Learning algorithms Machine learning multidisciplinary Science Science (multidisciplinary) |
title | Chia, a large annotated corpus of clinical trial eligibility criteria |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T04%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Chia,%20a%20large%20annotated%20corpus%20of%20clinical%20trial%20eligibility%20criteria&rft.jtitle=Scientific%20data&rft.au=Kury,%20Fabr%C3%ADcio&rft.date=2020-08-27&rft.volume=7&rft.issue=1&rft.spage=281&rft.epage=281&rft.pages=281-281&rft.artnum=281&rft.issn=2052-4463&rft.eissn=2052-4463&rft_id=info:doi/10.1038/s41597-020-00620-0&rft_dat=%3Cproquest_pubme%3E2437639048%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2437639048&rft_id=info:pmid/32855408&rfr_iscdi=true |