PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records

Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within E...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Medical Informatics Association : JAMIA 2020-11, Vol.27 (11), p.1675-1687
Hauptverfasser:	Zheng, Neil S, Feng, QiPing, Kerchberger, V Eric, Zhao, Juan, Edwards, Todd L, Cox, Nancy J, Stein, C Michael, Roden, Dan M, Denny, Joshua C, Wei, Wei-Qi
Format:	Artikel
Sprache:	eng
Schlagworte:	Adult Algorithms Dementia - genetics Diabetes Mellitus, Type 2 - genetics Electronic Health Records Genome-Wide Association Study Humans Hypothyroidism - genetics Information Storage and Retrieval - methods Knowledge Bases Natural Language Processing Phenotype Polymorphism, Single Nucleotide Research and Applications Terminology as Topic
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1687
container_issue	11
container_start_page	1675
container_title	Journal of the American Medical Informatics Association : JAMIA
container_volume	27
creator	Zheng, Neil S Feng, QiPing Kerchberger, V Eric Zhao, Juan Edwards, Todd L Cox, Nancy J Stein, C Michael Roden, Dan M Denny, Joshua C Wei, Wei-Qi
description	Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. Materials and Methods PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype’s quantified concepts and uses them to calculate an individual’s probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. Results In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. Conclusions PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.
doi_str_mv	10.1093/jamia/ocaa104
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7751140</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/jamia/ocaa104</oup_id><sourcerecordid>2446664143</sourcerecordid><originalsourceid>FETCH-LOGICAL-c420t-da0d4ee0b7141a715f92ec68fbb7e29e1a87bc81fdbc6c0452d9c7abc92047a83</originalsourceid><addsrcrecordid>eNqFkTFP3jAQhq2qCCgwdq08dgnYjhMnDEgIAa0EgqFIbNbFucSmSRxsB8S_5yt8BTp1utPdo-dOegn5ytk-Z3V-cAejgwNvADiTn8g2L4TKaiVvP3_ot8iXGO8Y46XIi02ylYvVtMyrbWKvLV7CfEiBjsuQXBYw-iUYpL8n_zhg2yNtICLtfKDW9TZLNvilt_OS6Gxx8ulpdlNPH12ybqI4oEnBT85QizAkSwMaH9q4SzY6GCLuresOuTk7_XXyI7u4Ov95cnyRGSlYylpgrURkjeKSg-JFVws0ZdU1jUJRI4dKNabiXduY0jBZiLY2ChpTCyYVVPkOOXr1zkszYmtwSgEGPQc3QnjSHpz-dzM5q3v_oJUqOJdsJfi-FgR_v2BMenTR4DDAhH6JWkhZlqXkMl-h2Stqgo8xYPd2hjP9Jx39ko5ep7Piv3387Y3-G8f7bb_M_3E9Ax8knyU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2446664143</pqid></control><display><type>article</type><title>PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records</title><source>MEDLINE</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Zheng, Neil S ; Feng, QiPing ; Kerchberger, V Eric ; Zhao, Juan ; Edwards, Todd L ; Cox, Nancy J ; Stein, C Michael ; Roden, Dan M ; Denny, Joshua C ; Wei, Wei-Qi</creator><creatorcontrib>Zheng, Neil S ; Feng, QiPing ; Kerchberger, V Eric ; Zhao, Juan ; Edwards, Todd L ; Cox, Nancy J ; Stein, C Michael ; Roden, Dan M ; Denny, Joshua C ; Wei, Wei-Qi</creatorcontrib><description>Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. Materials and Methods PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype’s quantified concepts and uses them to calculate an individual’s probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. Results In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. Conclusions PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.</description><identifier>ISSN: 1527-974X</identifier><identifier>ISSN: 1067-5027</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1093/jamia/ocaa104</identifier><identifier>PMID: 32974638</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Adult ; Algorithms ; Dementia - genetics ; Diabetes Mellitus, Type 2 - genetics ; Electronic Health Records ; Genome-Wide Association Study ; Humans ; Hypothyroidism - genetics ; Information Storage and Retrieval - methods ; Knowledge Bases ; Natural Language Processing ; Phenotype ; Polymorphism, Single Nucleotide ; Research and Applications ; Terminology as Topic</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2020-11, Vol.27 (11), p.1675-1687</ispartof><rights>The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. 2020</rights><rights>The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c420t-da0d4ee0b7141a715f92ec68fbb7e29e1a87bc81fdbc6c0452d9c7abc92047a83</citedby><cites>FETCH-LOGICAL-c420t-da0d4ee0b7141a715f92ec68fbb7e29e1a87bc81fdbc6c0452d9c7abc92047a83</cites><orcidid>0000-0001-8737-0773 ; 0000-0002-0342-1965</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7751140/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7751140/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1584,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32974638$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zheng, Neil S</creatorcontrib><creatorcontrib>Feng, QiPing</creatorcontrib><creatorcontrib>Kerchberger, V Eric</creatorcontrib><creatorcontrib>Zhao, Juan</creatorcontrib><creatorcontrib>Edwards, Todd L</creatorcontrib><creatorcontrib>Cox, Nancy J</creatorcontrib><creatorcontrib>Stein, C Michael</creatorcontrib><creatorcontrib>Roden, Dan M</creatorcontrib><creatorcontrib>Denny, Joshua C</creatorcontrib><creatorcontrib>Wei, Wei-Qi</creatorcontrib><title>PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. Materials and Methods PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype’s quantified concepts and uses them to calculate an individual’s probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. Results In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. Conclusions PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.</description><subject>Adult</subject><subject>Algorithms</subject><subject>Dementia - genetics</subject><subject>Diabetes Mellitus, Type 2 - genetics</subject><subject>Electronic Health Records</subject><subject>Genome-Wide Association Study</subject><subject>Humans</subject><subject>Hypothyroidism - genetics</subject><subject>Information Storage and Retrieval - methods</subject><subject>Knowledge Bases</subject><subject>Natural Language Processing</subject><subject>Phenotype</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Research and Applications</subject><subject>Terminology as Topic</subject><issn>1527-974X</issn><issn>1067-5027</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqFkTFP3jAQhq2qCCgwdq08dgnYjhMnDEgIAa0EgqFIbNbFucSmSRxsB8S_5yt8BTp1utPdo-dOegn5ytk-Z3V-cAejgwNvADiTn8g2L4TKaiVvP3_ot8iXGO8Y46XIi02ylYvVtMyrbWKvLV7CfEiBjsuQXBYw-iUYpL8n_zhg2yNtICLtfKDW9TZLNvilt_OS6Gxx8ulpdlNPH12ybqI4oEnBT85QizAkSwMaH9q4SzY6GCLuresOuTk7_XXyI7u4Ov95cnyRGSlYylpgrURkjeKSg-JFVws0ZdU1jUJRI4dKNabiXduY0jBZiLY2ChpTCyYVVPkOOXr1zkszYmtwSgEGPQc3QnjSHpz-dzM5q3v_oJUqOJdsJfi-FgR_v2BMenTR4DDAhH6JWkhZlqXkMl-h2Stqgo8xYPd2hjP9Jx39ko5ep7Piv3387Y3-G8f7bb_M_3E9Ax8knyU</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Zheng, Neil S</creator><creator>Feng, QiPing</creator><creator>Kerchberger, V Eric</creator><creator>Zhao, Juan</creator><creator>Edwards, Todd L</creator><creator>Cox, Nancy J</creator><creator>Stein, C Michael</creator><creator>Roden, Dan M</creator><creator>Denny, Joshua C</creator><creator>Wei, Wei-Qi</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8737-0773</orcidid><orcidid>https://orcid.org/0000-0002-0342-1965</orcidid></search><sort><creationdate>20201101</creationdate><title>PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records</title><author>Zheng, Neil S ; Feng, QiPing ; Kerchberger, V Eric ; Zhao, Juan ; Edwards, Todd L ; Cox, Nancy J ; Stein, C Michael ; Roden, Dan M ; Denny, Joshua C ; Wei, Wei-Qi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c420t-da0d4ee0b7141a715f92ec68fbb7e29e1a87bc81fdbc6c0452d9c7abc92047a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Adult</topic><topic>Algorithms</topic><topic>Dementia - genetics</topic><topic>Diabetes Mellitus, Type 2 - genetics</topic><topic>Electronic Health Records</topic><topic>Genome-Wide Association Study</topic><topic>Humans</topic><topic>Hypothyroidism - genetics</topic><topic>Information Storage and Retrieval - methods</topic><topic>Knowledge Bases</topic><topic>Natural Language Processing</topic><topic>Phenotype</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Research and Applications</topic><topic>Terminology as Topic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zheng, Neil S</creatorcontrib><creatorcontrib>Feng, QiPing</creatorcontrib><creatorcontrib>Kerchberger, V Eric</creatorcontrib><creatorcontrib>Zhao, Juan</creatorcontrib><creatorcontrib>Edwards, Todd L</creatorcontrib><creatorcontrib>Cox, Nancy J</creatorcontrib><creatorcontrib>Stein, C Michael</creatorcontrib><creatorcontrib>Roden, Dan M</creatorcontrib><creatorcontrib>Denny, Joshua C</creatorcontrib><creatorcontrib>Wei, Wei-Qi</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zheng, Neil S</au><au>Feng, QiPing</au><au>Kerchberger, V Eric</au><au>Zhao, Juan</au><au>Edwards, Todd L</au><au>Cox, Nancy J</au><au>Stein, C Michael</au><au>Roden, Dan M</au><au>Denny, Joshua C</au><au>Wei, Wei-Qi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2020-11-01</date><risdate>2020</risdate><volume>27</volume><issue>11</issue><spage>1675</spage><epage>1687</epage><pages>1675-1687</pages><issn>1527-974X</issn><issn>1067-5027</issn><eissn>1527-974X</eissn><abstract>Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. Materials and Methods PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype’s quantified concepts and uses them to calculate an individual’s probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. Results In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. Conclusions PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>32974638</pmid><doi>10.1093/jamia/ocaa104</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-8737-0773</orcidid><orcidid>https://orcid.org/0000-0002-0342-1965</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1527-974X
ispartof	Journal of the American Medical Informatics Association : JAMIA, 2020-11, Vol.27 (11), p.1675-1687
issn	1527-974X 1067-5027 1527-974X
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7751140
source	MEDLINE; Oxford University Press Journals All Titles (1996-Current); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Adult Algorithms Dementia - genetics Diabetes Mellitus, Type 2 - genetics Electronic Health Records Genome-Wide Association Study Humans Hypothyroidism - genetics Information Storage and Retrieval - methods Knowledge Bases Natural Language Processing Phenotype Polymorphism, Single Nucleotide Research and Applications Terminology as Topic
title	PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T10%3A35%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PheMap:%20a%20multi-resource%20knowledge%20base%20for%20high-throughput%20phenotyping%20within%20electronic%20health%20records&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Zheng,%20Neil%20S&rft.date=2020-11-01&rft.volume=27&rft.issue=11&rft.spage=1675&rft.epage=1687&rft.pages=1675-1687&rft.issn=1527-974X&rft.eissn=1527-974X&rft_id=info:doi/10.1093/jamia/ocaa104&rft_dat=%3Cproquest_pubme%3E2446664143%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2446664143&rft_id=info:pmid/32974638&rft_oup_id=10.1093/jamia/ocaa104&rfr_iscdi=true