Deep Learning Benchmarks on L1000 Gene Expression Data
Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hamp...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on computational biology and bioinformatics 2020-11, Vol.17 (6), p.1846-1857 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1857 |
---|---|
container_issue | 6 |
container_start_page | 1846 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 17 |
creator | McDermott, Matthew B.A. Wang, Jennifer Zhao, Wen-Ning Sheridan, Steven D. Szolovits, Peter Kohane, Isaac Haggarty, Stephen J. Perlis, Roy H. |
description | Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers. |
doi_str_mv | 10.1109/TCBB.2019.2910061 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8686113</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8686113</ieee_id><sourcerecordid>2468772759</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-b9c42f7995404a6021b2f4c40e909060bf70a21ed697d3a9ede7ec33589bb29c3</originalsourceid><addsrcrecordid>eNpdkVtrGzEQhUVoqXPpDwiFspCXvKwzuqy081KoL3UKhrwkz0Irzyab2lpXskv67ytj1yR5mmHmO4cZDmOXHIacA97cj0ejoQCOQ4EcQPMTdsqrypSIWn3Y9aoqK9RywM5SegYQCkF9YgMJiFkGp0xPiNbFnFwMXXgsRhT808rFX6noQzHPplDMKFAxfVlHSqnL04nbuAv2sXXLRJ8P9Zw9_Jjej2_L-d3s5_j7vPRKmU3ZoFeiNYiVAuU0CN6IVnkFhICgoWkNOMFpodEspENakCEvZVVj0wj08px92_uut82KFp7CJrqlXccuH_nX9q6zbzehe7KP_R-rsQapZTa4PhjE_veW0sauuuRpuXSB-m2yQnAQuuJcZfTqHfrcb2PI71mhdG2MMBVmiu8pH_uUIrXHYzjYXSp2l4rdpWIPqWTN19dfHBX_Y8jAlz3QEdFxXetacy7lP-HLjlM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2468772759</pqid></control><display><type>article</type><title>Deep Learning Benchmarks on L1000 Gene Expression Data</title><source>IEEE Electronic Library (IEL)</source><creator>McDermott, Matthew B.A. ; Wang, Jennifer ; Zhao, Wen-Ning ; Sheridan, Steven D. ; Szolovits, Peter ; Kohane, Isaac ; Haggarty, Stephen J. ; Perlis, Roy H.</creator><creatorcontrib>McDermott, Matthew B.A. ; Wang, Jennifer ; Zhao, Wen-Ning ; Sheridan, Steven D. ; Szolovits, Peter ; Kohane, Isaac ; Haggarty, Stephen J. ; Perlis, Roy H.</creatorcontrib><description>Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2019.2910061</identifier><identifier>PMID: 30990190</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; Benchmark testing ; Benchmarks ; Biological system modeling ; Classifiers ; Data models ; Datasets ; Decision trees ; Deep learning ; Gene expression ; gene expression data ; Genomes ; Learning algorithms ; Machine learning ; model development ; Neural networks</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2020-11, Vol.17 (6), p.1846-1857</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-b9c42f7995404a6021b2f4c40e909060bf70a21ed697d3a9ede7ec33589bb29c3</citedby><cites>FETCH-LOGICAL-c447t-b9c42f7995404a6021b2f4c40e909060bf70a21ed697d3a9ede7ec33589bb29c3</cites><orcidid>0000-0002-5862-6757 ; 0000-0001-6048-9707</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8686113$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8686113$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30990190$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>McDermott, Matthew B.A.</creatorcontrib><creatorcontrib>Wang, Jennifer</creatorcontrib><creatorcontrib>Zhao, Wen-Ning</creatorcontrib><creatorcontrib>Sheridan, Steven D.</creatorcontrib><creatorcontrib>Szolovits, Peter</creatorcontrib><creatorcontrib>Kohane, Isaac</creatorcontrib><creatorcontrib>Haggarty, Stephen J.</creatorcontrib><creatorcontrib>Perlis, Roy H.</creatorcontrib><title>Deep Learning Benchmarks on L1000 Gene Expression Data</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.</description><subject>Artificial neural networks</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Biological system modeling</subject><subject>Classifiers</subject><subject>Data models</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Gene expression</subject><subject>gene expression data</subject><subject>Genomes</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>model development</subject><subject>Neural networks</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkVtrGzEQhUVoqXPpDwiFspCXvKwzuqy081KoL3UKhrwkz0Irzyab2lpXskv67ytj1yR5mmHmO4cZDmOXHIacA97cj0ejoQCOQ4EcQPMTdsqrypSIWn3Y9aoqK9RywM5SegYQCkF9YgMJiFkGp0xPiNbFnFwMXXgsRhT808rFX6noQzHPplDMKFAxfVlHSqnL04nbuAv2sXXLRJ8P9Zw9_Jjej2_L-d3s5_j7vPRKmU3ZoFeiNYiVAuU0CN6IVnkFhICgoWkNOMFpodEspENakCEvZVVj0wj08px92_uut82KFp7CJrqlXccuH_nX9q6zbzehe7KP_R-rsQapZTa4PhjE_veW0sauuuRpuXSB-m2yQnAQuuJcZfTqHfrcb2PI71mhdG2MMBVmiu8pH_uUIrXHYzjYXSp2l4rdpWIPqWTN19dfHBX_Y8jAlz3QEdFxXetacy7lP-HLjlM</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>McDermott, Matthew B.A.</creator><creator>Wang, Jennifer</creator><creator>Zhao, Wen-Ning</creator><creator>Sheridan, Steven D.</creator><creator>Szolovits, Peter</creator><creator>Kohane, Isaac</creator><creator>Haggarty, Stephen J.</creator><creator>Perlis, Roy H.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5862-6757</orcidid><orcidid>https://orcid.org/0000-0001-6048-9707</orcidid></search><sort><creationdate>20201101</creationdate><title>Deep Learning Benchmarks on L1000 Gene Expression Data</title><author>McDermott, Matthew B.A. ; Wang, Jennifer ; Zhao, Wen-Ning ; Sheridan, Steven D. ; Szolovits, Peter ; Kohane, Isaac ; Haggarty, Stephen J. ; Perlis, Roy H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-b9c42f7995404a6021b2f4c40e909060bf70a21ed697d3a9ede7ec33589bb29c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Biological system modeling</topic><topic>Classifiers</topic><topic>Data models</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Gene expression</topic><topic>gene expression data</topic><topic>Genomes</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>model development</topic><topic>Neural networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McDermott, Matthew B.A.</creatorcontrib><creatorcontrib>Wang, Jennifer</creatorcontrib><creatorcontrib>Zhao, Wen-Ning</creatorcontrib><creatorcontrib>Sheridan, Steven D.</creatorcontrib><creatorcontrib>Szolovits, Peter</creatorcontrib><creatorcontrib>Kohane, Isaac</creatorcontrib><creatorcontrib>Haggarty, Stephen J.</creatorcontrib><creatorcontrib>Perlis, Roy H.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>McDermott, Matthew B.A.</au><au>Wang, Jennifer</au><au>Zhao, Wen-Ning</au><au>Sheridan, Steven D.</au><au>Szolovits, Peter</au><au>Kohane, Isaac</au><au>Haggarty, Stephen J.</au><au>Perlis, Roy H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Learning Benchmarks on L1000 Gene Expression Data</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2020-11-01</date><risdate>2020</risdate><volume>17</volume><issue>6</issue><spage>1846</spage><epage>1857</epage><pages>1846-1857</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30990190</pmid><doi>10.1109/TCBB.2019.2910061</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5862-6757</orcidid><orcidid>https://orcid.org/0000-0001-6048-9707</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2020-11, Vol.17 (6), p.1846-1857 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_ieee_primary_8686113 |
source | IEEE Electronic Library (IEL) |
subjects | Artificial neural networks Benchmark testing Benchmarks Biological system modeling Classifiers Data models Datasets Decision trees Deep learning Gene expression gene expression data Genomes Learning algorithms Machine learning model development Neural networks |
title | Deep Learning Benchmarks on L1000 Gene Expression Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T18%3A57%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Learning%20Benchmarks%20on%20L1000%20Gene%20Expression%20Data&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=McDermott,%20Matthew%20B.A.&rft.date=2020-11-01&rft.volume=17&rft.issue=6&rft.spage=1846&rft.epage=1857&rft.pages=1846-1857&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2019.2910061&rft_dat=%3Cproquest_RIE%3E2468772759%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2468772759&rft_id=info:pmid/30990190&rft_ieee_id=8686113&rfr_iscdi=true |