Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2024-08, Vol.21 (8), p.1454-1461
Hauptverfasser: Chen, Valerie, Yang, Muyu, Cui, Wenbo, Kim, Joon Sik, Talwalkar, Ameet, Ma, Jian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1461
container_issue 8
container_start_page 1454
container_title Nature methods
container_volume 21
creator Chen, Valerie
Yang, Muyu
Cui, Wenbo
Kim, Joon Sik
Talwalkar, Ameet
Ma, Jian
description Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.
doi_str_mv 10.1038/s41592-024-02359-7
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11348280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3091283395</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</originalsourceid><addsrcrecordid>eNp9kctu1TAQhiNERUvhBVggS2xYkOJLfGKvUFVxkyp1U9aW40xOXTm2sZ1WZ4fEK_CEPAlu05bLgoVlS_83_3jmb5oXBB8RzMTb3BEuaYtpVw_jsu0fNQeEd6LtCeaP799Ykv3mac6XGDPWUf6k2WeSUCo7ctB8P47R7azfIusLpJig6MEBmrW5sB6QA538KiMT5rgUXWzw2qHBBhe2u5_ffkRbJu1cfoMSVGYGP95CGWk_ohBjSGXxtljIaAoJebhGI1yBC7GyJT9r9mp9hud392Hz5cP785NP7enZx88nx6etYXxT2nHEQk70ZvSBC2PwQOm0YRPm0I_dpEFWCRsJWDK6wT3HYqCS9aJnAndsYIfNu9U3LsMMo6m9k3YqJjvrtFNBW_W34u2F2oYrRQjrBBW4Ory-c0jh6wK5qNlmA85pD2HJitVdU8GY5BV99Q96GZZUF7dSmAgmWaXoSpkUck4wPfyGYHUzqVpDVjVkdRuy6mvRyz_neCi5T7UCbAVylfwW0u_e_7H9BenYtsc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3091018393</pqid></control><display><type>article</type><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>Nature Journals Online</source><creator>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</creator><creatorcontrib>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</creatorcontrib><description>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</description><identifier>ISSN: 1548-7091</identifier><identifier>ISSN: 1548-7105</identifier><identifier>EISSN: 1548-7105</identifier><identifier>DOI: 10.1038/s41592-024-02359-7</identifier><identifier>PMID: 39122941</identifier><language>eng</language><publisher>New York: Nature Publishing Group US</publisher><subject>631/114/1305 ; 631/114/2397 ; 631/1647/794 ; 631/208/212 ; Algorithms ; Bioinformatics ; Biological Microscopy ; Biological Techniques ; Biology ; Biomedical and Life Sciences ; Biomedical Engineering/Biotechnology ; Collaboration ; Computational Biology - methods ; Computer applications ; Computer science ; Design techniques ; Gene expression ; Humans ; Large language models ; Learning algorithms ; Life Sciences ; Machine Learning ; Neural networks ; Perspective ; Prediction models ; Proteins ; Proteomics</subject><ispartof>Nature methods, 2024-08, Vol.21 (8), p.1454-1461</ispartof><rights>Springer Nature America, Inc. 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>2024. Springer Nature America, Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</cites><orcidid>0000-0002-0142-0328 ; 0009-0006-9866-4439 ; 0000-0001-6650-1893 ; 0000-0002-4202-5834 ; 0009-0007-2783-0265 ; 0009-0006-0057-4735</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1038/s41592-024-02359-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1038/s41592-024-02359-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39122941$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Valerie</creatorcontrib><creatorcontrib>Yang, Muyu</creatorcontrib><creatorcontrib>Cui, Wenbo</creatorcontrib><creatorcontrib>Kim, Joon Sik</creatorcontrib><creatorcontrib>Talwalkar, Ameet</creatorcontrib><creatorcontrib>Ma, Jian</creatorcontrib><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><title>Nature methods</title><addtitle>Nat Methods</addtitle><addtitle>Nat Methods</addtitle><description>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</description><subject>631/114/1305</subject><subject>631/114/2397</subject><subject>631/1647/794</subject><subject>631/208/212</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biological Microscopy</subject><subject>Biological Techniques</subject><subject>Biology</subject><subject>Biomedical and Life Sciences</subject><subject>Biomedical Engineering/Biotechnology</subject><subject>Collaboration</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>Computer science</subject><subject>Design techniques</subject><subject>Gene expression</subject><subject>Humans</subject><subject>Large language models</subject><subject>Learning algorithms</subject><subject>Life Sciences</subject><subject>Machine Learning</subject><subject>Neural networks</subject><subject>Perspective</subject><subject>Prediction models</subject><subject>Proteins</subject><subject>Proteomics</subject><issn>1548-7091</issn><issn>1548-7105</issn><issn>1548-7105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kctu1TAQhiNERUvhBVggS2xYkOJLfGKvUFVxkyp1U9aW40xOXTm2sZ1WZ4fEK_CEPAlu05bLgoVlS_83_3jmb5oXBB8RzMTb3BEuaYtpVw_jsu0fNQeEd6LtCeaP799Ykv3mac6XGDPWUf6k2WeSUCo7ctB8P47R7azfIusLpJig6MEBmrW5sB6QA538KiMT5rgUXWzw2qHBBhe2u5_ffkRbJu1cfoMSVGYGP95CGWk_ohBjSGXxtljIaAoJebhGI1yBC7GyJT9r9mp9hud392Hz5cP785NP7enZx88nx6etYXxT2nHEQk70ZvSBC2PwQOm0YRPm0I_dpEFWCRsJWDK6wT3HYqCS9aJnAndsYIfNu9U3LsMMo6m9k3YqJjvrtFNBW_W34u2F2oYrRQjrBBW4Ory-c0jh6wK5qNlmA85pD2HJitVdU8GY5BV99Q96GZZUF7dSmAgmWaXoSpkUck4wPfyGYHUzqVpDVjVkdRuy6mvRyz_neCi5T7UCbAVylfwW0u_e_7H9BenYtsc</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Chen, Valerie</creator><creator>Yang, Muyu</creator><creator>Cui, Wenbo</creator><creator>Kim, Joon Sik</creator><creator>Talwalkar, Ameet</creator><creator>Ma, Jian</creator><general>Nature Publishing Group US</general><general>Nature Publishing Group</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7SS</scope><scope>7TK</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0142-0328</orcidid><orcidid>https://orcid.org/0009-0006-9866-4439</orcidid><orcidid>https://orcid.org/0000-0001-6650-1893</orcidid><orcidid>https://orcid.org/0000-0002-4202-5834</orcidid><orcidid>https://orcid.org/0009-0007-2783-0265</orcidid><orcidid>https://orcid.org/0009-0006-0057-4735</orcidid></search><sort><creationdate>20240801</creationdate><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><author>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>631/114/1305</topic><topic>631/114/2397</topic><topic>631/1647/794</topic><topic>631/208/212</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biological Microscopy</topic><topic>Biological Techniques</topic><topic>Biology</topic><topic>Biomedical and Life Sciences</topic><topic>Biomedical Engineering/Biotechnology</topic><topic>Collaboration</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>Computer science</topic><topic>Design techniques</topic><topic>Gene expression</topic><topic>Humans</topic><topic>Large language models</topic><topic>Learning algorithms</topic><topic>Life Sciences</topic><topic>Machine Learning</topic><topic>Neural networks</topic><topic>Perspective</topic><topic>Prediction models</topic><topic>Proteins</topic><topic>Proteomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Valerie</creatorcontrib><creatorcontrib>Yang, Muyu</creatorcontrib><creatorcontrib>Cui, Wenbo</creatorcontrib><creatorcontrib>Kim, Joon Sik</creatorcontrib><creatorcontrib>Talwalkar, Ameet</creatorcontrib><creatorcontrib>Ma, Jian</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nature methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Valerie</au><au>Yang, Muyu</au><au>Cui, Wenbo</au><au>Kim, Joon Sik</au><au>Talwalkar, Ameet</au><au>Ma, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</atitle><jtitle>Nature methods</jtitle><stitle>Nat Methods</stitle><addtitle>Nat Methods</addtitle><date>2024-08-01</date><risdate>2024</risdate><volume>21</volume><issue>8</issue><spage>1454</spage><epage>1461</epage><pages>1454-1461</pages><issn>1548-7091</issn><issn>1548-7105</issn><eissn>1548-7105</eissn><abstract>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</abstract><cop>New York</cop><pub>Nature Publishing Group US</pub><pmid>39122941</pmid><doi>10.1038/s41592-024-02359-7</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-0142-0328</orcidid><orcidid>https://orcid.org/0009-0006-9866-4439</orcidid><orcidid>https://orcid.org/0000-0001-6650-1893</orcidid><orcidid>https://orcid.org/0000-0002-4202-5834</orcidid><orcidid>https://orcid.org/0009-0007-2783-0265</orcidid><orcidid>https://orcid.org/0009-0006-0057-4735</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1548-7091
ispartof Nature methods, 2024-08, Vol.21 (8), p.1454-1461
issn 1548-7091
1548-7105
1548-7105
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11348280
source MEDLINE; Springer Nature - Complete Springer Journals; Nature Journals Online
subjects 631/114/1305
631/114/2397
631/1647/794
631/208/212
Algorithms
Bioinformatics
Biological Microscopy
Biological Techniques
Biology
Biomedical and Life Sciences
Biomedical Engineering/Biotechnology
Collaboration
Computational Biology - methods
Computer applications
Computer science
Design techniques
Gene expression
Humans
Large language models
Learning algorithms
Life Sciences
Machine Learning
Neural networks
Perspective
Prediction models
Proteins
Proteomics
title Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T00%3A20%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Applying%20interpretable%20machine%20learning%20in%20computational%20biology%E2%80%94pitfalls,%20recommendations%20and%20opportunities%20for%20new%20developments&rft.jtitle=Nature%20methods&rft.au=Chen,%20Valerie&rft.date=2024-08-01&rft.volume=21&rft.issue=8&rft.spage=1454&rft.epage=1461&rft.pages=1454-1461&rft.issn=1548-7091&rft.eissn=1548-7105&rft_id=info:doi/10.1038/s41592-024-02359-7&rft_dat=%3Cproquest_pubme%3E3091283395%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3091018393&rft_id=info:pmid/39122941&rfr_iscdi=true