Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values

In qualitative or quantitative studies of structure–activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of medicinal chemistry 2020-08, Vol.63 (16), p.8761-8777
Hauptverfasser: Rodríguez-Pérez, Raquel, Bajorath, Jürgen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 8777
container_issue 16
container_start_page 8761
container_title Journal of medicinal chemistry
container_volume 63
creator Rodríguez-Pérez, Raquel
Bajorath, Jürgen
description In qualitative or quantitative studies of structure–activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
doi_str_mv 10.1021/acs.jmedchem.9b01101
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2289573699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2289573699</sourcerecordid><originalsourceid>FETCH-LOGICAL-a460t-fb403d5983ecb520eb7bbec656c92b050f876004f3a53eba9e66aadfb9e2b693</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EgvL4Bwj5yCVlbSducqwqXlIRSBSuke1saFASBztB9Movx6UtR06Wtd_M7gwh5wzGDDi7UsaP3xsszBKbcaaBMWB7ZMQSDlGcQrxPRgCcR1xycUSOvX8HAMG4OCRHgiWMp3IyIt_3bY-uc9irvrIttSWd2aazQ1vQqemrz6pf0SeHRWXWc09LZ5tfpMYv-qDMsmqRzlG5tmrf6IMtsPb0xa8_c2tUTadd5-xX1aiNXgXj56UK8hV9VfWA_pQclKr2eLZ9T8ji5noxu4vmj7f3s-k8UrGEPip1DKJIslSg0SEk6onWaGQiTcY1JFCmEwkQl0IlArXKUEqlilJnyLXMxAm53NiGcz7C2j5vKm-wrlWLdvA552mWTITM1mi8QY2z3jss886FAG6VM8jX5eeh_HxXfr4tP8guthsGHWZ_ol3bAYAN8Cu3g2tD3v89fwDV6pcE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2289573699</pqid></control><display><type>article</type><title>Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values</title><source>MEDLINE</source><source>American Chemical Society Journals</source><creator>Rodríguez-Pérez, Raquel ; Bajorath, Jürgen</creator><creatorcontrib>Rodríguez-Pérez, Raquel ; Bajorath, Jürgen</creatorcontrib><description>In qualitative or quantitative studies of structure–activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.</description><identifier>ISSN: 0022-2623</identifier><identifier>EISSN: 1520-4804</identifier><identifier>DOI: 10.1021/acs.jmedchem.9b01101</identifier><identifier>PMID: 31512867</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Deep Learning - statistics &amp; numerical data ; Organic Chemicals - chemistry ; Support Vector Machine - statistics &amp; numerical data</subject><ispartof>Journal of medicinal chemistry, 2020-08, Vol.63 (16), p.8761-8777</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a460t-fb403d5983ecb520eb7bbec656c92b050f876004f3a53eba9e66aadfb9e2b693</citedby><cites>FETCH-LOGICAL-a460t-fb403d5983ecb520eb7bbec656c92b050f876004f3a53eba9e66aadfb9e2b693</cites><orcidid>0000-0002-0557-5714</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jmedchem.9b01101$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jmedchem.9b01101$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31512867$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rodríguez-Pérez, Raquel</creatorcontrib><creatorcontrib>Bajorath, Jürgen</creatorcontrib><title>Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values</title><title>Journal of medicinal chemistry</title><addtitle>J. Med. Chem</addtitle><description>In qualitative or quantitative studies of structure–activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.</description><subject>Deep Learning - statistics &amp; numerical data</subject><subject>Organic Chemicals - chemistry</subject><subject>Support Vector Machine - statistics &amp; numerical data</subject><issn>0022-2623</issn><issn>1520-4804</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEtPwzAQhC0EgvL4Bwj5yCVlbSducqwqXlIRSBSuke1saFASBztB9Movx6UtR06Wtd_M7gwh5wzGDDi7UsaP3xsszBKbcaaBMWB7ZMQSDlGcQrxPRgCcR1xycUSOvX8HAMG4OCRHgiWMp3IyIt_3bY-uc9irvrIttSWd2aazQ1vQqemrz6pf0SeHRWXWc09LZ5tfpMYv-qDMsmqRzlG5tmrf6IMtsPb0xa8_c2tUTadd5-xX1aiNXgXj56UK8hV9VfWA_pQclKr2eLZ9T8ji5noxu4vmj7f3s-k8UrGEPip1DKJIslSg0SEk6onWaGQiTcY1JFCmEwkQl0IlArXKUEqlilJnyLXMxAm53NiGcz7C2j5vKm-wrlWLdvA552mWTITM1mi8QY2z3jss886FAG6VM8jX5eeh_HxXfr4tP8guthsGHWZ_ol3bAYAN8Cu3g2tD3v89fwDV6pcE</recordid><startdate>20200827</startdate><enddate>20200827</enddate><creator>Rodríguez-Pérez, Raquel</creator><creator>Bajorath, Jürgen</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0557-5714</orcidid></search><sort><creationdate>20200827</creationdate><title>Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values</title><author>Rodríguez-Pérez, Raquel ; Bajorath, Jürgen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a460t-fb403d5983ecb520eb7bbec656c92b050f876004f3a53eba9e66aadfb9e2b693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Deep Learning - statistics &amp; numerical data</topic><topic>Organic Chemicals - chemistry</topic><topic>Support Vector Machine - statistics &amp; numerical data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodríguez-Pérez, Raquel</creatorcontrib><creatorcontrib>Bajorath, Jürgen</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of medicinal chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rodríguez-Pérez, Raquel</au><au>Bajorath, Jürgen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values</atitle><jtitle>Journal of medicinal chemistry</jtitle><addtitle>J. Med. Chem</addtitle><date>2020-08-27</date><risdate>2020</risdate><volume>63</volume><issue>16</issue><spage>8761</spage><epage>8777</epage><pages>8761-8777</pages><issn>0022-2623</issn><eissn>1520-4804</eissn><abstract>In qualitative or quantitative studies of structure–activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>31512867</pmid><doi>10.1021/acs.jmedchem.9b01101</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-0557-5714</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0022-2623
ispartof Journal of medicinal chemistry, 2020-08, Vol.63 (16), p.8761-8777
issn 0022-2623
1520-4804
language eng
recordid cdi_proquest_miscellaneous_2289573699
source MEDLINE; American Chemical Society Journals
subjects Deep Learning - statistics & numerical data
Organic Chemicals - chemistry
Support Vector Machine - statistics & numerical data
title Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T20%3A36%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpretation%20of%20Compound%20Activity%20Predictions%20from%20Complex%20Machine%20Learning%20Models%20Using%20Local%20Approximations%20and%20Shapley%20Values&rft.jtitle=Journal%20of%20medicinal%20chemistry&rft.au=Rodri%CC%81guez-Pe%CC%81rez,%20Raquel&rft.date=2020-08-27&rft.volume=63&rft.issue=16&rft.spage=8761&rft.epage=8777&rft.pages=8761-8777&rft.issn=0022-2623&rft.eissn=1520-4804&rft_id=info:doi/10.1021/acs.jmedchem.9b01101&rft_dat=%3Cproquest_cross%3E2289573699%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2289573699&rft_id=info:pmid/31512867&rfr_iscdi=true