A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology

Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of genera...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental health perspectives 1999-03, Vol.107 (3), p.217-222
Hauptverfasser: Chen, Colin, Chock, David P., Winkler, Sandra L.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 222
container_issue 3
container_start_page 217
container_title Environmental health perspectives
container_volume 107
creator Chen, Colin
Chock, David P.
Winkler, Sandra L.
description Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter ≤10 μm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.
doi_str_mv 10.1289/ehp.99107217
format Article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1566403</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>3434512</jstor_id><sourcerecordid>3434512</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3867-bbd943f30b7b78809f1f625d946bcfcc71eb0255414e0539e3e8d0d52340ed9d3</originalsourceid><addsrcrecordid>eNqFkU2LFDEQhoMo7rh68yw5yJ7sNd_pXIRhWFdhRGH1KKG7Uz2bJZ2MSbcw_nqjs8p48lRQ9fBUFS9Czym5pKw1r-F2f2kMJZpR_QCtqJSsMYaJh2hFiKGN0kqeoSel3BFCaKvUY3RGCVGigiv0dY1v_LSEbvYp4pt5cQecRrxJcUxLdD7usI_4GiLkLvgf4PDWR-gy_pAchILHlPHaZ_wphbD8dlztvYPJp5B2h6fo0diFAs_u6zn68vbq8-Zds_14_X6z3jYDb5Vu-t4ZwUdOet3rtiVmpKNisjZVP4zDoCn0hEkpqAAiuQEOrSNOMi4IOOP4OXpz9O6XfgI3QJzruXaf_dTlg02dt_9Oor-1u_TdUqmUILwKLu4FOX1boMx28mWAELoIaSmWaq5bqsX_QSGp0JpV8NURHHIqJcP49xpK7K_gbA3O_gmu4i9OPziBj0lV4OURuCtzyqcyxom2XPC6mfGfZU6gbw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>14514772</pqid></control><display><type>article</type><title>A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>JSTOR Archive Collection A-Z Listing</source><source>PubMed Central</source><creator>Chen, Colin ; Chock, David P. ; Winkler, Sandra L.</creator><creatorcontrib>Chen, Colin ; Chock, David P. ; Winkler, Sandra L.</creatorcontrib><description>Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter ≤10 μm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.</description><identifier>ISSN: 0091-6765</identifier><identifier>EISSN: 1552-9924</identifier><identifier>DOI: 10.1289/ehp.99107217</identifier><identifier>PMID: 10064552</identifier><language>eng</language><publisher>United States: National Institute of Environmental Health Sciences. National Institutes of Health. Department of Health, Education and Welfare</publisher><subject>Air pollution ; Air Pollution - statistics &amp; numerical data ; Air quality ; Bias ; Coefficients ; Computer Simulation ; Confidence Intervals ; Confounding Factors (Epidemiology) ; Correlation coefficients ; Data Interpretation, Statistical ; Datasets ; Environmental Exposure - statistics &amp; numerical data ; Linear Models ; Linear regression ; Mortality ; Particulate matter ; Regression analysis ; Research Design - standards ; Sample Size</subject><ispartof>Environmental health perspectives, 1999-03, Vol.107 (3), p.217-222</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3867-bbd943f30b7b78809f1f625d946bcfcc71eb0255414e0539e3e8d0d52340ed9d3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/3434512$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/3434512$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,315,728,781,785,804,865,886,27929,27930,53796,53798,58022,58255</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/10064552$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Colin</creatorcontrib><creatorcontrib>Chock, David P.</creatorcontrib><creatorcontrib>Winkler, Sandra L.</creatorcontrib><title>A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology</title><title>Environmental health perspectives</title><addtitle>Environ Health Perspect</addtitle><description>Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter ≤10 μm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.</description><subject>Air pollution</subject><subject>Air Pollution - statistics &amp; numerical data</subject><subject>Air quality</subject><subject>Bias</subject><subject>Coefficients</subject><subject>Computer Simulation</subject><subject>Confidence Intervals</subject><subject>Confounding Factors (Epidemiology)</subject><subject>Correlation coefficients</subject><subject>Data Interpretation, Statistical</subject><subject>Datasets</subject><subject>Environmental Exposure - statistics &amp; numerical data</subject><subject>Linear Models</subject><subject>Linear regression</subject><subject>Mortality</subject><subject>Particulate matter</subject><subject>Regression analysis</subject><subject>Research Design - standards</subject><subject>Sample Size</subject><issn>0091-6765</issn><issn>1552-9924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1999</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU2LFDEQhoMo7rh68yw5yJ7sNd_pXIRhWFdhRGH1KKG7Uz2bJZ2MSbcw_nqjs8p48lRQ9fBUFS9Czym5pKw1r-F2f2kMJZpR_QCtqJSsMYaJh2hFiKGN0kqeoSel3BFCaKvUY3RGCVGigiv0dY1v_LSEbvYp4pt5cQecRrxJcUxLdD7usI_4GiLkLvgf4PDWR-gy_pAchILHlPHaZ_wphbD8dlztvYPJp5B2h6fo0diFAs_u6zn68vbq8-Zds_14_X6z3jYDb5Vu-t4ZwUdOet3rtiVmpKNisjZVP4zDoCn0hEkpqAAiuQEOrSNOMi4IOOP4OXpz9O6XfgI3QJzruXaf_dTlg02dt_9Oor-1u_TdUqmUILwKLu4FOX1boMx28mWAELoIaSmWaq5bqsX_QSGp0JpV8NURHHIqJcP49xpK7K_gbA3O_gmu4i9OPziBj0lV4OURuCtzyqcyxom2XPC6mfGfZU6gbw</recordid><startdate>19990301</startdate><enddate>19990301</enddate><creator>Chen, Colin</creator><creator>Chock, David P.</creator><creator>Winkler, Sandra L.</creator><general>National Institute of Environmental Health Sciences. National Institutes of Health. Department of Health, Education and Welfare</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7ST</scope><scope>C1K</scope><scope>SOI</scope><scope>7T2</scope><scope>7TV</scope><scope>7U2</scope><scope>7U7</scope><scope>5PM</scope></search><sort><creationdate>19990301</creationdate><title>A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology</title><author>Chen, Colin ; Chock, David P. ; Winkler, Sandra L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3867-bbd943f30b7b78809f1f625d946bcfcc71eb0255414e0539e3e8d0d52340ed9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1999</creationdate><topic>Air pollution</topic><topic>Air Pollution - statistics &amp; numerical data</topic><topic>Air quality</topic><topic>Bias</topic><topic>Coefficients</topic><topic>Computer Simulation</topic><topic>Confidence Intervals</topic><topic>Confounding Factors (Epidemiology)</topic><topic>Correlation coefficients</topic><topic>Data Interpretation, Statistical</topic><topic>Datasets</topic><topic>Environmental Exposure - statistics &amp; numerical data</topic><topic>Linear Models</topic><topic>Linear regression</topic><topic>Mortality</topic><topic>Particulate matter</topic><topic>Regression analysis</topic><topic>Research Design - standards</topic><topic>Sample Size</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Colin</creatorcontrib><creatorcontrib>Chock, David P.</creatorcontrib><creatorcontrib>Winkler, Sandra L.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Environment Abstracts</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Environment Abstracts</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Pollution Abstracts</collection><collection>Safety Science and Risk</collection><collection>Toxicology Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Environmental health perspectives</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Colin</au><au>Chock, David P.</au><au>Winkler, Sandra L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology</atitle><jtitle>Environmental health perspectives</jtitle><addtitle>Environ Health Perspect</addtitle><date>1999-03-01</date><risdate>1999</risdate><volume>107</volume><issue>3</issue><spage>217</spage><epage>222</epage><pages>217-222</pages><issn>0091-6765</issn><eissn>1552-9924</eissn><abstract>Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter ≤10 μm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.</abstract><cop>United States</cop><pub>National Institute of Environmental Health Sciences. National Institutes of Health. Department of Health, Education and Welfare</pub><pmid>10064552</pmid><doi>10.1289/ehp.99107217</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0091-6765
ispartof Environmental health perspectives, 1999-03, Vol.107 (3), p.217-222
issn 0091-6765
1552-9924
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1566403
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; JSTOR Archive Collection A-Z Listing; PubMed Central
subjects Air pollution
Air Pollution - statistics & numerical data
Air quality
Bias
Coefficients
Computer Simulation
Confidence Intervals
Confounding Factors (Epidemiology)
Correlation coefficients
Data Interpretation, Statistical
Datasets
Environmental Exposure - statistics & numerical data
Linear Models
Linear regression
Mortality
Particulate matter
Regression analysis
Research Design - standards
Sample Size
title A Simulation Study of Confounding in Generalized Linear Models for Air Pollution Epidemiology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T22%3A32%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Simulation%20Study%20of%20Confounding%20in%20Generalized%20Linear%20Models%20for%20Air%20Pollution%20Epidemiology&rft.jtitle=Environmental%20health%20perspectives&rft.au=Chen,%20Colin&rft.date=1999-03-01&rft.volume=107&rft.issue=3&rft.spage=217&rft.epage=222&rft.pages=217-222&rft.issn=0091-6765&rft.eissn=1552-9924&rft_id=info:doi/10.1289/ehp.99107217&rft_dat=%3Cjstor_pubme%3E3434512%3C/jstor_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=14514772&rft_id=info:pmid/10064552&rft_jstor_id=3434512&rfr_iscdi=true