Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk

Abstract The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:American journal of epidemiology 2018-07, Vol.187 (7), p.1530-1538
Hauptverfasser: Paige, Ellie, Barrett, Jessica, Stevens, David, Keogh, Ruth H, Sweeting, Michael J, Nazareth, Irwin, Petersen, Irene, Wood, Angela M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1538
container_issue 7
container_start_page 1530
container_title American journal of epidemiology
container_volume 187
creator Paige, Ellie
Barrett, Jessica
Stevens, David
Keogh, Ruth H
Sweeting, Michael J
Nazareth, Irwin
Petersen, Irene
Wood, Angela M
description Abstract The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age–specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997–2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).
doi_str_mv 10.1093/aje/kwy018
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6030927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/aje/kwy018</oup_id><sourcerecordid>2306242708</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-ee233db663529fce7ab728d84cf62c9d9cb658744e4a890fb7b1909ab5480163</originalsourceid><addsrcrecordid>eNp9kcFu1DAQhi1ERZfChQdAllAlhBRqO45jX5BQ6bZIWxWhcrYcZ9L1bhKntlNUXoGXxu2WCjhwmsN8_mbGP0KvKHlPiSqPzAaOtt9vCZVP0ILyWhSCVeIpWhBCWKGYYPvoeYwbQihVFXmG9pmqJJeULdDPlRnbwYQtPvct9BF3PuCLKbnB_XDjFU5rwN8iYN_hrzCBSdDiczBxDjDAmOJ9w8UtXhqbfIjYjfikB5uCH53FZ2D6tM5PrQ9txMnjLwFaZxNezik78CcXsw3uHS_QXmf6CC8f6gG6XJ5cHp8Vq4vTz8cfV4XlpUgFACvLthGirJjqLNSmqZlsJbedYFa1yjaikjXnwI1UpGvqhiqiTFNxSagoD9CHnXaamwFam88IptdTcPkfbrU3Tv_dGd1aX_kbLUhJFKuz4O2DIPjrGWLSg4sW-t6M4OeoGaGKi8zezXrzD7rxcxjzdZqVRDDOaiIz9W5H2eBjDNA9LkOJvotY54j1LuIMv_5z_Uf0d6YZONwBfp7-J_oF1VCxWQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2306242708</pqid></control><display><type>article</type><title>Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk</title><source>Oxford University Press Journals All Titles (1996-Current)</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Alma/SFX Local Collection</source><creator>Paige, Ellie ; Barrett, Jessica ; Stevens, David ; Keogh, Ruth H ; Sweeting, Michael J ; Nazareth, Irwin ; Petersen, Irene ; Wood, Angela M</creator><creatorcontrib>Paige, Ellie ; Barrett, Jessica ; Stevens, David ; Keogh, Ruth H ; Sweeting, Michael J ; Nazareth, Irwin ; Petersen, Irene ; Wood, Angela M</creatorcontrib><description>Abstract The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age–specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997–2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).</description><identifier>ISSN: 0002-9262</identifier><identifier>ISSN: 1476-6256</identifier><identifier>EISSN: 1476-6256</identifier><identifier>DOI: 10.1093/aje/kwy018</identifier><identifier>PMID: 29584812</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Adult ; Blood pressure ; Calibration ; Cardiovascular diseases ; Cardiovascular Diseases - epidemiology ; Cardiovascular Diseases - etiology ; Cholesterol ; Confidence intervals ; Diabetes mellitus ; Disease Susceptibility - epidemiology ; Electronic health records ; Electronic Health Records - statistics &amp; numerical data ; Electronic medical records ; England - epidemiology ; Feasibility Studies ; Female ; Forecasting - methods ; Health risk assessment ; Health risks ; Humans ; Hypertension ; Linear Models ; Male ; Middle Aged ; Multivariate Analysis ; Patient-Specific Modeling ; Practice of Epidemiology ; Prediction models ; Primary Health Care - statistics &amp; numerical data ; Proportional Hazards Models ; Risk analysis ; Risk Assessment - methods ; Risk Factors ; Smoking ; Statistical analysis ; Statistical models ; Wales - epidemiology</subject><ispartof>American journal of epidemiology, 2018-07, Vol.187 (7), p.1530-1538</ispartof><rights>The Author(s) 2018. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. 2018</rights><rights>The Author(s) 2018. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c436t-ee233db663529fce7ab728d84cf62c9d9cb658744e4a890fb7b1909ab5480163</citedby><cites>FETCH-LOGICAL-c436t-ee233db663529fce7ab728d84cf62c9d9cb658744e4a890fb7b1909ab5480163</cites><orcidid>0000-0003-0855-9872</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,1578,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29584812$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Paige, Ellie</creatorcontrib><creatorcontrib>Barrett, Jessica</creatorcontrib><creatorcontrib>Stevens, David</creatorcontrib><creatorcontrib>Keogh, Ruth H</creatorcontrib><creatorcontrib>Sweeting, Michael J</creatorcontrib><creatorcontrib>Nazareth, Irwin</creatorcontrib><creatorcontrib>Petersen, Irene</creatorcontrib><creatorcontrib>Wood, Angela M</creatorcontrib><title>Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk</title><title>American journal of epidemiology</title><addtitle>Am J Epidemiol</addtitle><description>Abstract The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age–specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997–2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).</description><subject>Adult</subject><subject>Blood pressure</subject><subject>Calibration</subject><subject>Cardiovascular diseases</subject><subject>Cardiovascular Diseases - epidemiology</subject><subject>Cardiovascular Diseases - etiology</subject><subject>Cholesterol</subject><subject>Confidence intervals</subject><subject>Diabetes mellitus</subject><subject>Disease Susceptibility - epidemiology</subject><subject>Electronic health records</subject><subject>Electronic Health Records - statistics &amp; numerical data</subject><subject>Electronic medical records</subject><subject>England - epidemiology</subject><subject>Feasibility Studies</subject><subject>Female</subject><subject>Forecasting - methods</subject><subject>Health risk assessment</subject><subject>Health risks</subject><subject>Humans</subject><subject>Hypertension</subject><subject>Linear Models</subject><subject>Male</subject><subject>Middle Aged</subject><subject>Multivariate Analysis</subject><subject>Patient-Specific Modeling</subject><subject>Practice of Epidemiology</subject><subject>Prediction models</subject><subject>Primary Health Care - statistics &amp; numerical data</subject><subject>Proportional Hazards Models</subject><subject>Risk analysis</subject><subject>Risk Assessment - methods</subject><subject>Risk Factors</subject><subject>Smoking</subject><subject>Statistical analysis</subject><subject>Statistical models</subject><subject>Wales - epidemiology</subject><issn>0002-9262</issn><issn>1476-6256</issn><issn>1476-6256</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNp9kcFu1DAQhi1ERZfChQdAllAlhBRqO45jX5BQ6bZIWxWhcrYcZ9L1bhKntlNUXoGXxu2WCjhwmsN8_mbGP0KvKHlPiSqPzAaOtt9vCZVP0ILyWhSCVeIpWhBCWKGYYPvoeYwbQihVFXmG9pmqJJeULdDPlRnbwYQtPvct9BF3PuCLKbnB_XDjFU5rwN8iYN_hrzCBSdDiczBxDjDAmOJ9w8UtXhqbfIjYjfikB5uCH53FZ2D6tM5PrQ9txMnjLwFaZxNezik78CcXsw3uHS_QXmf6CC8f6gG6XJ5cHp8Vq4vTz8cfV4XlpUgFACvLthGirJjqLNSmqZlsJbedYFa1yjaikjXnwI1UpGvqhiqiTFNxSagoD9CHnXaamwFam88IptdTcPkfbrU3Tv_dGd1aX_kbLUhJFKuz4O2DIPjrGWLSg4sW-t6M4OeoGaGKi8zezXrzD7rxcxjzdZqVRDDOaiIz9W5H2eBjDNA9LkOJvotY54j1LuIMv_5z_Uf0d6YZONwBfp7-J_oF1VCxWQ</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Paige, Ellie</creator><creator>Barrett, Jessica</creator><creator>Stevens, David</creator><creator>Keogh, Ruth H</creator><creator>Sweeting, Michael J</creator><creator>Nazareth, Irwin</creator><creator>Petersen, Irene</creator><creator>Wood, Angela M</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QP</scope><scope>7T2</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>C1K</scope><scope>H94</scope><scope>K9.</scope><scope>NAPCQ</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-0855-9872</orcidid></search><sort><creationdate>20180701</creationdate><title>Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk</title><author>Paige, Ellie ; Barrett, Jessica ; Stevens, David ; Keogh, Ruth H ; Sweeting, Michael J ; Nazareth, Irwin ; Petersen, Irene ; Wood, Angela M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-ee233db663529fce7ab728d84cf62c9d9cb658744e4a890fb7b1909ab5480163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Adult</topic><topic>Blood pressure</topic><topic>Calibration</topic><topic>Cardiovascular diseases</topic><topic>Cardiovascular Diseases - epidemiology</topic><topic>Cardiovascular Diseases - etiology</topic><topic>Cholesterol</topic><topic>Confidence intervals</topic><topic>Diabetes mellitus</topic><topic>Disease Susceptibility - epidemiology</topic><topic>Electronic health records</topic><topic>Electronic Health Records - statistics &amp; numerical data</topic><topic>Electronic medical records</topic><topic>England - epidemiology</topic><topic>Feasibility Studies</topic><topic>Female</topic><topic>Forecasting - methods</topic><topic>Health risk assessment</topic><topic>Health risks</topic><topic>Humans</topic><topic>Hypertension</topic><topic>Linear Models</topic><topic>Male</topic><topic>Middle Aged</topic><topic>Multivariate Analysis</topic><topic>Patient-Specific Modeling</topic><topic>Practice of Epidemiology</topic><topic>Prediction models</topic><topic>Primary Health Care - statistics &amp; numerical data</topic><topic>Proportional Hazards Models</topic><topic>Risk analysis</topic><topic>Risk Assessment - methods</topic><topic>Risk Factors</topic><topic>Smoking</topic><topic>Statistical analysis</topic><topic>Statistical models</topic><topic>Wales - epidemiology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Paige, Ellie</creatorcontrib><creatorcontrib>Barrett, Jessica</creatorcontrib><creatorcontrib>Stevens, David</creatorcontrib><creatorcontrib>Keogh, Ruth H</creatorcontrib><creatorcontrib>Sweeting, Michael J</creatorcontrib><creatorcontrib>Nazareth, Irwin</creatorcontrib><creatorcontrib>Petersen, Irene</creatorcontrib><creatorcontrib>Wood, Angela M</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Environmental Sciences and Pollution Management</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>American journal of epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Paige, Ellie</au><au>Barrett, Jessica</au><au>Stevens, David</au><au>Keogh, Ruth H</au><au>Sweeting, Michael J</au><au>Nazareth, Irwin</au><au>Petersen, Irene</au><au>Wood, Angela M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk</atitle><jtitle>American journal of epidemiology</jtitle><addtitle>Am J Epidemiol</addtitle><date>2018-07-01</date><risdate>2018</risdate><volume>187</volume><issue>7</issue><spage>1530</spage><epage>1538</epage><pages>1530-1538</pages><issn>0002-9262</issn><issn>1476-6256</issn><eissn>1476-6256</eissn><abstract>Abstract The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age–specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997–2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777).</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>29584812</pmid><doi>10.1093/aje/kwy018</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-0855-9872</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0002-9262
ispartof American journal of epidemiology, 2018-07, Vol.187 (7), p.1530-1538
issn 0002-9262
1476-6256
1476-6256
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6030927
source Oxford University Press Journals All Titles (1996-Current); MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Alma/SFX Local Collection
subjects Adult
Blood pressure
Calibration
Cardiovascular diseases
Cardiovascular Diseases - epidemiology
Cardiovascular Diseases - etiology
Cholesterol
Confidence intervals
Diabetes mellitus
Disease Susceptibility - epidemiology
Electronic health records
Electronic Health Records - statistics & numerical data
Electronic medical records
England - epidemiology
Feasibility Studies
Female
Forecasting - methods
Health risk assessment
Health risks
Humans
Hypertension
Linear Models
Male
Middle Aged
Multivariate Analysis
Patient-Specific Modeling
Practice of Epidemiology
Prediction models
Primary Health Care - statistics & numerical data
Proportional Hazards Models
Risk analysis
Risk Assessment - methods
Risk Factors
Smoking
Statistical analysis
Statistical models
Wales - epidemiology
title Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T20%3A03%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Landmark%20Models%20for%20Optimizing%20the%20Use%20of%20Repeated%20Measurements%20of%20Risk%20Factors%20in%20Electronic%20Health%20Records%20to%20Predict%20Future%20Disease%20Risk&rft.jtitle=American%20journal%20of%20epidemiology&rft.au=Paige,%20Ellie&rft.date=2018-07-01&rft.volume=187&rft.issue=7&rft.spage=1530&rft.epage=1538&rft.pages=1530-1538&rft.issn=0002-9262&rft.eissn=1476-6256&rft_id=info:doi/10.1093/aje/kwy018&rft_dat=%3Cproquest_pubme%3E2306242708%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2306242708&rft_id=info:pmid/29584812&rft_oup_id=10.1093/aje/kwy018&rfr_iscdi=true