Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature methods 2024-08, Vol.21 (8), p.1454-1461
Hauptverfasser:	Chen, Valerie, Yang, Muyu, Cui, Wenbo, Kim, Joon Sik, Talwalkar, Ameet, Ma, Jian
Format:	Artikel
Sprache:	eng
Schlagworte:	631/114/1305 631/114/2397 631/1647/794 631/208/212 Algorithms Bioinformatics Biological Microscopy Biological Techniques Biology Biomedical and Life Sciences Biomedical Engineering/Biotechnology Collaboration Computational Biology - methods Computer applications Computer science Design techniques Gene expression Humans Large language models Learning algorithms Life Sciences Machine Learning Neural networks Perspective Prediction models Proteins Proteomics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1461
container_issue	8
container_start_page	1454
container_title	Nature methods
container_volume	21
creator	Chen, Valerie Yang, Muyu Cui, Wenbo Kim, Joon Sik Talwalkar, Ameet Ma, Jian
description	Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.
doi_str_mv	10.1038/s41592-024-02359-7
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11348280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3091283395</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</originalsourceid><addsrcrecordid>eNp9kctu1TAQhiNERUvhBVggS2xYkOJLfGKvUFVxkyp1U9aW40xOXTm2sZ1WZ4fEK_CEPAlu05bLgoVlS_83_3jmb5oXBB8RzMTb3BEuaYtpVw_jsu0fNQeEd6LtCeaP799Ykv3mac6XGDPWUf6k2WeSUCo7ctB8P47R7azfIusLpJig6MEBmrW5sB6QA538KiMT5rgUXWzw2qHBBhe2u5_ffkRbJu1cfoMSVGYGP95CGWk_ohBjSGXxtljIaAoJebhGI1yBC7GyJT9r9mp9hud392Hz5cP785NP7enZx88nx6etYXxT2nHEQk70ZvSBC2PwQOm0YRPm0I_dpEFWCRsJWDK6wT3HYqCS9aJnAndsYIfNu9U3LsMMo6m9k3YqJjvrtFNBW_W34u2F2oYrRQjrBBW4Ory-c0jh6wK5qNlmA85pD2HJitVdU8GY5BV99Q96GZZUF7dSmAgmWaXoSpkUck4wPfyGYHUzqVpDVjVkdRuy6mvRyz_neCi5T7UCbAVylfwW0u_e_7H9BenYtsc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3091018393</pqid></control><display><type>article</type><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>Nature Journals Online</source><creator>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</creator><creatorcontrib>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</creatorcontrib><description>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</description><identifier>ISSN: 1548-7091</identifier><identifier>ISSN: 1548-7105</identifier><identifier>EISSN: 1548-7105</identifier><identifier>DOI: 10.1038/s41592-024-02359-7</identifier><identifier>PMID: 39122941</identifier><language>eng</language><publisher>New York: Nature Publishing Group US</publisher><subject>631/114/1305 ; 631/114/2397 ; 631/1647/794 ; 631/208/212 ; Algorithms ; Bioinformatics ; Biological Microscopy ; Biological Techniques ; Biology ; Biomedical and Life Sciences ; Biomedical Engineering/Biotechnology ; Collaboration ; Computational Biology - methods ; Computer applications ; Computer science ; Design techniques ; Gene expression ; Humans ; Large language models ; Learning algorithms ; Life Sciences ; Machine Learning ; Neural networks ; Perspective ; Prediction models ; Proteins ; Proteomics</subject><ispartof>Nature methods, 2024-08, Vol.21 (8), p.1454-1461</ispartof><rights>Springer Nature America, Inc. 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>2024. Springer Nature America, Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</cites><orcidid>0000-0002-0142-0328 ; 0009-0006-9866-4439 ; 0000-0001-6650-1893 ; 0000-0002-4202-5834 ; 0009-0007-2783-0265 ; 0009-0006-0057-4735</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1038/s41592-024-02359-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1038/s41592-024-02359-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39122941$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Valerie</creatorcontrib><creatorcontrib>Yang, Muyu</creatorcontrib><creatorcontrib>Cui, Wenbo</creatorcontrib><creatorcontrib>Kim, Joon Sik</creatorcontrib><creatorcontrib>Talwalkar, Ameet</creatorcontrib><creatorcontrib>Ma, Jian</creatorcontrib><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><title>Nature methods</title><addtitle>Nat Methods</addtitle><addtitle>Nat Methods</addtitle><description>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</description><subject>631/114/1305</subject><subject>631/114/2397</subject><subject>631/1647/794</subject><subject>631/208/212</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biological Microscopy</subject><subject>Biological Techniques</subject><subject>Biology</subject><subject>Biomedical and Life Sciences</subject><subject>Biomedical Engineering/Biotechnology</subject><subject>Collaboration</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>Computer science</subject><subject>Design techniques</subject><subject>Gene expression</subject><subject>Humans</subject><subject>Large language models</subject><subject>Learning algorithms</subject><subject>Life Sciences</subject><subject>Machine Learning</subject><subject>Neural networks</subject><subject>Perspective</subject><subject>Prediction models</subject><subject>Proteins</subject><subject>Proteomics</subject><issn>1548-7091</issn><issn>1548-7105</issn><issn>1548-7105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kctu1TAQhiNERUvhBVggS2xYkOJLfGKvUFVxkyp1U9aW40xOXTm2sZ1WZ4fEK_CEPAlu05bLgoVlS_83_3jmb5oXBB8RzMTb3BEuaYtpVw_jsu0fNQeEd6LtCeaP799Ykv3mac6XGDPWUf6k2WeSUCo7ctB8P47R7azfIusLpJig6MEBmrW5sB6QA538KiMT5rgUXWzw2qHBBhe2u5_ffkRbJu1cfoMSVGYGP95CGWk_ohBjSGXxtljIaAoJebhGI1yBC7GyJT9r9mp9hud392Hz5cP785NP7enZx88nx6etYXxT2nHEQk70ZvSBC2PwQOm0YRPm0I_dpEFWCRsJWDK6wT3HYqCS9aJnAndsYIfNu9U3LsMMo6m9k3YqJjvrtFNBW_W34u2F2oYrRQjrBBW4Ory-c0jh6wK5qNlmA85pD2HJitVdU8GY5BV99Q96GZZUF7dSmAgmWaXoSpkUck4wPfyGYHUzqVpDVjVkdRuy6mvRyz_neCi5T7UCbAVylfwW0u_e_7H9BenYtsc</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Chen, Valerie</creator><creator>Yang, Muyu</creator><creator>Cui, Wenbo</creator><creator>Kim, Joon Sik</creator><creator>Talwalkar, Ameet</creator><creator>Ma, Jian</creator><general>Nature Publishing Group US</general><general>Nature Publishing Group</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7SS</scope><scope>7TK</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0142-0328</orcidid><orcidid>https://orcid.org/0009-0006-9866-4439</orcidid><orcidid>https://orcid.org/0000-0001-6650-1893</orcidid><orcidid>https://orcid.org/0000-0002-4202-5834</orcidid><orcidid>https://orcid.org/0009-0007-2783-0265</orcidid><orcidid>https://orcid.org/0009-0006-0057-4735</orcidid></search><sort><creationdate>20240801</creationdate><title>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</title><author>Chen, Valerie ; Yang, Muyu ; Cui, Wenbo ; Kim, Joon Sik ; Talwalkar, Ameet ; Ma, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-dd089f21038b58cc0b22f63f05e7d4fae91030c9e0932607508b29378738043b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>631/114/1305</topic><topic>631/114/2397</topic><topic>631/1647/794</topic><topic>631/208/212</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biological Microscopy</topic><topic>Biological Techniques</topic><topic>Biology</topic><topic>Biomedical and Life Sciences</topic><topic>Biomedical Engineering/Biotechnology</topic><topic>Collaboration</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>Computer science</topic><topic>Design techniques</topic><topic>Gene expression</topic><topic>Humans</topic><topic>Large language models</topic><topic>Learning algorithms</topic><topic>Life Sciences</topic><topic>Machine Learning</topic><topic>Neural networks</topic><topic>Perspective</topic><topic>Prediction models</topic><topic>Proteins</topic><topic>Proteomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Valerie</creatorcontrib><creatorcontrib>Yang, Muyu</creatorcontrib><creatorcontrib>Cui, Wenbo</creatorcontrib><creatorcontrib>Kim, Joon Sik</creatorcontrib><creatorcontrib>Talwalkar, Ameet</creatorcontrib><creatorcontrib>Ma, Jian</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nature methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Valerie</au><au>Yang, Muyu</au><au>Cui, Wenbo</au><au>Kim, Joon Sik</au><au>Talwalkar, Ameet</au><au>Ma, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments</atitle><jtitle>Nature methods</jtitle><stitle>Nat Methods</stitle><addtitle>Nat Methods</addtitle><date>2024-08-01</date><risdate>2024</risdate><volume>21</volume><issue>8</issue><spage>1454</spage><epage>1461</epage><pages>1454-1461</pages><issn>1548-7091</issn><issn>1548-7105</issn><eissn>1548-7105</eissn><abstract>Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers. This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.</abstract><cop>New York</cop><pub>Nature Publishing Group US</pub><pmid>39122941</pmid><doi>10.1038/s41592-024-02359-7</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-0142-0328</orcidid><orcidid>https://orcid.org/0009-0006-9866-4439</orcidid><orcidid>https://orcid.org/0000-0001-6650-1893</orcidid><orcidid>https://orcid.org/0000-0002-4202-5834</orcidid><orcidid>https://orcid.org/0009-0007-2783-0265</orcidid><orcidid>https://orcid.org/0009-0006-0057-4735</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1548-7091
ispartof	Nature methods, 2024-08, Vol.21 (8), p.1454-1461
issn	1548-7091 1548-7105 1548-7105
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11348280
source	MEDLINE; Springer Nature - Complete Springer Journals; Nature Journals Online
subjects	631/114/1305 631/114/2397 631/1647/794 631/208/212 Algorithms Bioinformatics Biological Microscopy Biological Techniques Biology Biomedical and Life Sciences Biomedical Engineering/Biotechnology Collaboration Computational Biology - methods Computer applications Computer science Design techniques Gene expression Humans Large language models Learning algorithms Life Sciences Machine Learning Neural networks Perspective Prediction models Proteins Proteomics
title	Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T00%3A20%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Applying%20interpretable%20machine%20learning%20in%20computational%20biology%E2%80%94pitfalls,%20recommendations%20and%20opportunities%20for%20new%20developments&rft.jtitle=Nature%20methods&rft.au=Chen,%20Valerie&rft.date=2024-08-01&rft.volume=21&rft.issue=8&rft.spage=1454&rft.epage=1461&rft.pages=1454-1461&rft.issn=1548-7091&rft.eissn=1548-7105&rft_id=info:doi/10.1038/s41592-024-02359-7&rft_dat=%3Cproquest_pubme%3E3091283395%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3091018393&rft_id=info:pmid/39122941&rfr_iscdi=true