Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?

Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an imp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemical information and modeling 2014-03, Vol.54 (3), p.944-955
Hauptverfasser:	Ballester, Pedro J, Schreyer, Adrian, Blundell, Tom L
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Biochemistry Computational Biology Databases, Protein Ligands Models, Biological Models, Molecular Molecules Protein Binding Protein Conformation Proteins Proteins - chemistry Proteins - metabolism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	955
container_issue	3
container_start_page	944
container_title	Journal of chemical information and modeling
container_volume	54
creator	Ballester, Pedro J Schreyer, Adrian Blundell, Tom L
description	Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.
doi_str_mv	10.1021/ci500091r
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3966527</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3261004391</sourcerecordid><originalsourceid>FETCH-LOGICAL-a499t-70d911b082b83123ba2e41af3ba9673a7492066d84e211822236a4f349ba19373</originalsourceid><addsrcrecordid>eNplkctKAzEUhoMo1tvCF5CAuHBRzW3SyUap9QoVXSi4C5lMpkbapCZTseDCd_ANfRJTe6EiWeTA-fKdQ34AdjE6wojgY20zhJDAYQVs4IyJpuDoaXVeZ4I3wGaMLwhRKjhZBw3CMpKnswE-zr2JUMFbHwy8D0bbaGDn2QysVn14bqIOdlhb76CvUt_Xxrrvz6-u7SlXwo4fDPvmPRm6RpWw9lNPW-tRUPWvsLR6_vzMutK6HmxXlXW2Hp9ug7VK9aPZmd1b4PHy4qFz3ezeXd102t2mYkLUzRYqBcYFykmRU0xooYhhWFWpELxFVYsJgjgvc2YIxjkhhHLFKspEobCgLboFTqbe4agYmFIbVwfVl8NgByqMpVdW_u04-yx7_k2m7-IZmQj2Z4LgX0cm1vLFj4JLO0ucYcwoZ5Ql6nBK6eBjDKZaTMBIToKSi6ASu7e80oKcJ5OAgymgdFya9k_0A_i8mu4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1511436434</pqid></control><display><type>article</type><title>Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?</title><source>MEDLINE</source><source>ACS Publications</source><creator>Ballester, Pedro J ; Schreyer, Adrian ; Blundell, Tom L</creator><creatorcontrib>Ballester, Pedro J ; Schreyer, Adrian ; Blundell, Tom L</creatorcontrib><description>Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci500091r</identifier><identifier>PMID: 24528282</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Artificial Intelligence ; Biochemistry ; Computational Biology ; Databases, Protein ; Ligands ; Models, Biological ; Models, Molecular ; Molecules ; Protein Binding ; Protein Conformation ; Proteins ; Proteins - chemistry ; Proteins - metabolism</subject><ispartof>Journal of chemical information and modeling, 2014-03, Vol.54 (3), p.944-955</ispartof><rights>Copyright © 2014 American Chemical Society</rights><rights>Copyright American Chemical Society Mar 24, 2014</rights><rights>Copyright © 2014 American Chemical Society 2014 American Chemical Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a499t-70d911b082b83123ba2e41af3ba9673a7492066d84e211822236a4f349ba19373</citedby><cites>FETCH-LOGICAL-a499t-70d911b082b83123ba2e41af3ba9673a7492066d84e211822236a4f349ba19373</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/ci500091r$$EPDF$$P50$$Gacs$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/ci500091r$$EHTML$$P50$$Gacs$$Hfree_for_read</linktohtml><link.rule.ids>230,314,777,781,882,2752,27057,27905,27906,56719,56769</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24528282$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ballester, Pedro J</creatorcontrib><creatorcontrib>Schreyer, Adrian</creatorcontrib><creatorcontrib>Blundell, Tom L</creatorcontrib><title>Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.</description><subject>Artificial Intelligence</subject><subject>Biochemistry</subject><subject>Computational Biology</subject><subject>Databases, Protein</subject><subject>Ligands</subject><subject>Models, Biological</subject><subject>Models, Molecular</subject><subject>Molecules</subject><subject>Protein Binding</subject><subject>Protein Conformation</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteins - metabolism</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>N~.</sourceid><sourceid>EIF</sourceid><recordid>eNplkctKAzEUhoMo1tvCF5CAuHBRzW3SyUap9QoVXSi4C5lMpkbapCZTseDCd_ANfRJTe6EiWeTA-fKdQ34AdjE6wojgY20zhJDAYQVs4IyJpuDoaXVeZ4I3wGaMLwhRKjhZBw3CMpKnswE-zr2JUMFbHwy8D0bbaGDn2QysVn14bqIOdlhb76CvUt_Xxrrvz6-u7SlXwo4fDPvmPRm6RpWw9lNPW-tRUPWvsLR6_vzMutK6HmxXlXW2Hp9ug7VK9aPZmd1b4PHy4qFz3ezeXd102t2mYkLUzRYqBcYFykmRU0xooYhhWFWpELxFVYsJgjgvc2YIxjkhhHLFKspEobCgLboFTqbe4agYmFIbVwfVl8NgByqMpVdW_u04-yx7_k2m7-IZmQj2Z4LgX0cm1vLFj4JLO0ucYcwoZ5Ql6nBK6eBjDKZaTMBIToKSi6ASu7e80oKcJ5OAgymgdFya9k_0A_i8mu4</recordid><startdate>20140324</startdate><enddate>20140324</enddate><creator>Ballester, Pedro J</creator><creator>Schreyer, Adrian</creator><creator>Blundell, Tom L</creator><general>American Chemical Society</general><scope>N~.</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>5PM</scope></search><sort><creationdate>20140324</creationdate><title>Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?</title><author>Ballester, Pedro J ; Schreyer, Adrian ; Blundell, Tom L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a499t-70d911b082b83123ba2e41af3ba9673a7492066d84e211822236a4f349ba19373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Artificial Intelligence</topic><topic>Biochemistry</topic><topic>Computational Biology</topic><topic>Databases, Protein</topic><topic>Ligands</topic><topic>Models, Biological</topic><topic>Models, Molecular</topic><topic>Molecules</topic><topic>Protein Binding</topic><topic>Protein Conformation</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ballester, Pedro J</creatorcontrib><creatorcontrib>Schreyer, Adrian</creatorcontrib><creatorcontrib>Blundell, Tom L</creatorcontrib><collection>American Chemical Society (ACS) Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ballester, Pedro J</au><au>Schreyer, Adrian</au><au>Blundell, Tom L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2014-03-24</date><risdate>2014</risdate><volume>54</volume><issue>3</issue><spage>944</spage><epage>955</epage><pages>944-955</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>24528282</pmid><doi>10.1021/ci500091r</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1549-9596
ispartof	Journal of chemical information and modeling, 2014-03, Vol.54 (3), p.944-955
issn	1549-9596 1549-960X
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3966527
source	MEDLINE; ACS Publications
subjects	Artificial Intelligence Biochemistry Computational Biology Databases, Protein Ligands Models, Biological Models, Molecular Molecules Protein Binding Protein Conformation Proteins Proteins - chemistry Proteins - metabolism
title	Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T05%3A51%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Does%20a%20More%20Precise%20Chemical%20Description%20of%20Protein%E2%80%93Ligand%20Complexes%20Lead%20to%20More%20Accurate%20Prediction%20of%20Binding%20Affinity?&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Ballester,%20Pedro%20J&rft.date=2014-03-24&rft.volume=54&rft.issue=3&rft.spage=944&rft.epage=955&rft.pages=944-955&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/ci500091r&rft_dat=%3Cproquest_pubme%3E3261004391%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1511436434&rft_id=info:pmid/24528282&rfr_iscdi=true