Bias-Correction of Regression Models: A Case Study on hERG Inhibition

In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Chemical Information and Modeling 2009-06, Vol.49 (6), p.1486-1496
Hauptverfasser: Hansen, Katja, Rathke, Fabian, Schroeter, Timon, Rast, Georg, Fox, Thomas, Kriegl, Jan M, Mika, Sebastian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1496
container_issue 6
container_start_page 1486
container_title Journal of Chemical Information and Modeling
container_volume 49
creator Hansen, Katja
Rathke, Fabian
Schroeter, Timon
Rast, Georg
Fox, Thomas
Kriegl, Jan M
Mika, Sebastian
description In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 ± 0.01 to 0.57 ± 0.01 using a local bias correction strategy.
doi_str_mv 10.1021/ci9000794
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_216221616</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1760243161</sourcerecordid><originalsourceid>FETCH-LOGICAL-a365t-39edc241afd91128f11d185c2e54635f2ba895a61999ad13e2831cc509bdf0e73</originalsourceid><addsrcrecordid>eNplkEtLAzEUhYMotlYX_gEJggsXo7nJJJ24q0OthYpQdT1k8rBT2qYmM4v-e2dosQsXl_vgu-fAQegayAMQCo-6koSQoUxPUB84JQmHlJ52cyoTyaXooYsYl4QwJgU9Rz2QKeOMij4aP1cqJrkPweq68hvsHZ7b72Bj7LY3b-wqPuERzlW0-KNuzA6398V4PsHTzaIqq-7rEp05tYr26tAH6Otl_Jm_JrP3yTQfzRLFBK8TJq3RNAXljASgmQMwkHFNLU8F446WKpNcCZBSKgPM0oyB1pzI0jhih2yAbve62-B_GhvrYumbsGktCwqCtgWihe73kA4-xmBdsQ3VWoVdAaTo8ir-8mrZm4NgU66tOZKHgFrgbg8oHY9m_4V-Abz0bkk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>216221616</pqid></control><display><type>article</type><title>Bias-Correction of Regression Models: A Case Study on hERG Inhibition</title><source>ACS Publications</source><source>MEDLINE</source><creator>Hansen, Katja ; Rathke, Fabian ; Schroeter, Timon ; Rast, Georg ; Fox, Thomas ; Kriegl, Jan M ; Mika, Sebastian</creator><creatorcontrib>Hansen, Katja ; Rathke, Fabian ; Schroeter, Timon ; Rast, Georg ; Fox, Thomas ; Kriegl, Jan M ; Mika, Sebastian</creatorcontrib><description>In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 ± 0.01 to 0.57 ± 0.01 using a local bias correction strategy.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1520-5142</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci9000794</identifier><identifier>PMID: 19435326</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Analytical chemistry ; Computational Chemistry ; Drug Evaluation, Preclinical ; Ether-A-Go-Go Potassium Channels - antagonists &amp; inhibitors ; Humans ; Inhibitory Concentration 50 ; Mathematical models ; Mean square errors ; Molecular structure ; Neural Networks (Computer) ; Performance evaluation ; Potassium Channel Blockers - chemistry ; Potassium Channel Blockers - pharmacology ; Quantitative Structure-Activity Relationship ; Regression Analysis ; Reproducibility of Results</subject><ispartof>Journal of Chemical Information and Modeling, 2009-06, Vol.49 (6), p.1486-1496</ispartof><rights>Copyright © 2009 American Chemical Society</rights><rights>Copyright American Chemical Society Jun 22, 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a365t-39edc241afd91128f11d185c2e54635f2ba895a61999ad13e2831cc509bdf0e73</citedby><cites>FETCH-LOGICAL-a365t-39edc241afd91128f11d185c2e54635f2ba895a61999ad13e2831cc509bdf0e73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/ci9000794$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/ci9000794$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,776,780,2752,27053,27901,27902,56713,56763</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/19435326$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hansen, Katja</creatorcontrib><creatorcontrib>Rathke, Fabian</creatorcontrib><creatorcontrib>Schroeter, Timon</creatorcontrib><creatorcontrib>Rast, Georg</creatorcontrib><creatorcontrib>Fox, Thomas</creatorcontrib><creatorcontrib>Kriegl, Jan M</creatorcontrib><creatorcontrib>Mika, Sebastian</creatorcontrib><title>Bias-Correction of Regression Models: A Case Study on hERG Inhibition</title><title>Journal of Chemical Information and Modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 ± 0.01 to 0.57 ± 0.01 using a local bias correction strategy.</description><subject>Analytical chemistry</subject><subject>Computational Chemistry</subject><subject>Drug Evaluation, Preclinical</subject><subject>Ether-A-Go-Go Potassium Channels - antagonists &amp; inhibitors</subject><subject>Humans</subject><subject>Inhibitory Concentration 50</subject><subject>Mathematical models</subject><subject>Mean square errors</subject><subject>Molecular structure</subject><subject>Neural Networks (Computer)</subject><subject>Performance evaluation</subject><subject>Potassium Channel Blockers - chemistry</subject><subject>Potassium Channel Blockers - pharmacology</subject><subject>Quantitative Structure-Activity Relationship</subject><subject>Regression Analysis</subject><subject>Reproducibility of Results</subject><issn>1549-9596</issn><issn>1520-5142</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNplkEtLAzEUhYMotlYX_gEJggsXo7nJJJ24q0OthYpQdT1k8rBT2qYmM4v-e2dosQsXl_vgu-fAQegayAMQCo-6koSQoUxPUB84JQmHlJ52cyoTyaXooYsYl4QwJgU9Rz2QKeOMij4aP1cqJrkPweq68hvsHZ7b72Bj7LY3b-wqPuERzlW0-KNuzA6398V4PsHTzaIqq-7rEp05tYr26tAH6Otl_Jm_JrP3yTQfzRLFBK8TJq3RNAXljASgmQMwkHFNLU8F446WKpNcCZBSKgPM0oyB1pzI0jhih2yAbve62-B_GhvrYumbsGktCwqCtgWihe73kA4-xmBdsQ3VWoVdAaTo8ir-8mrZm4NgU66tOZKHgFrgbg8oHY9m_4V-Abz0bkk</recordid><startdate>20090622</startdate><enddate>20090622</enddate><creator>Hansen, Katja</creator><creator>Rathke, Fabian</creator><creator>Schroeter, Timon</creator><creator>Rast, Georg</creator><creator>Fox, Thomas</creator><creator>Kriegl, Jan M</creator><creator>Mika, Sebastian</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20090622</creationdate><title>Bias-Correction of Regression Models: A Case Study on hERG Inhibition</title><author>Hansen, Katja ; Rathke, Fabian ; Schroeter, Timon ; Rast, Georg ; Fox, Thomas ; Kriegl, Jan M ; Mika, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a365t-39edc241afd91128f11d185c2e54635f2ba895a61999ad13e2831cc509bdf0e73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Analytical chemistry</topic><topic>Computational Chemistry</topic><topic>Drug Evaluation, Preclinical</topic><topic>Ether-A-Go-Go Potassium Channels - antagonists &amp; inhibitors</topic><topic>Humans</topic><topic>Inhibitory Concentration 50</topic><topic>Mathematical models</topic><topic>Mean square errors</topic><topic>Molecular structure</topic><topic>Neural Networks (Computer)</topic><topic>Performance evaluation</topic><topic>Potassium Channel Blockers - chemistry</topic><topic>Potassium Channel Blockers - pharmacology</topic><topic>Quantitative Structure-Activity Relationship</topic><topic>Regression Analysis</topic><topic>Reproducibility of Results</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hansen, Katja</creatorcontrib><creatorcontrib>Rathke, Fabian</creatorcontrib><creatorcontrib>Schroeter, Timon</creatorcontrib><creatorcontrib>Rast, Georg</creatorcontrib><creatorcontrib>Fox, Thomas</creatorcontrib><creatorcontrib>Kriegl, Jan M</creatorcontrib><creatorcontrib>Mika, Sebastian</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Chemical Information and Modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hansen, Katja</au><au>Rathke, Fabian</au><au>Schroeter, Timon</au><au>Rast, Georg</au><au>Fox, Thomas</au><au>Kriegl, Jan M</au><au>Mika, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bias-Correction of Regression Models: A Case Study on hERG Inhibition</atitle><jtitle>Journal of Chemical Information and Modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2009-06-22</date><risdate>2009</risdate><volume>49</volume><issue>6</issue><spage>1486</spage><epage>1496</epage><pages>1486-1496</pages><issn>1549-9596</issn><eissn>1520-5142</eissn><eissn>1549-960X</eissn><abstract>In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 ± 0.01 to 0.57 ± 0.01 using a local bias correction strategy.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>19435326</pmid><doi>10.1021/ci9000794</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of Chemical Information and Modeling, 2009-06, Vol.49 (6), p.1486-1496
issn 1549-9596
1520-5142
1549-960X
language eng
recordid cdi_proquest_journals_216221616
source ACS Publications; MEDLINE
subjects Analytical chemistry
Computational Chemistry
Drug Evaluation, Preclinical
Ether-A-Go-Go Potassium Channels - antagonists & inhibitors
Humans
Inhibitory Concentration 50
Mathematical models
Mean square errors
Molecular structure
Neural Networks (Computer)
Performance evaluation
Potassium Channel Blockers - chemistry
Potassium Channel Blockers - pharmacology
Quantitative Structure-Activity Relationship
Regression Analysis
Reproducibility of Results
title Bias-Correction of Regression Models: A Case Study on hERG Inhibition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T12%3A30%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bias-Correction%20of%20Regression%20Models:%20A%20Case%20Study%20on%20hERG%20Inhibition&rft.jtitle=Journal%20of%20Chemical%20Information%20and%20Modeling&rft.au=Hansen,%20Katja&rft.date=2009-06-22&rft.volume=49&rft.issue=6&rft.spage=1486&rft.epage=1496&rft.pages=1486-1496&rft.issn=1549-9596&rft.eissn=1520-5142&rft_id=info:doi/10.1021/ci9000794&rft_dat=%3Cproquest_cross%3E1760243161%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=216221616&rft_id=info:pmid/19435326&rfr_iscdi=true