Multi-task Deep Neural Networks in Automated Protein Function Prediction
In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neura...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2017-05 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Rifaioglu, Ahmet Sureyya Tunca Doğan Maria Jesus Martin Cetin-Atalay, Rengul Atalay, Mehmet Volkan |
description | In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2075681502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2075681502</sourcerecordid><originalsourceid>FETCH-proquest_journals_20756815023</originalsourceid><addsrcrecordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2075681502</pqid></control><display><type>article</type><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><source>Free E- Journals</source><creator>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creator><creatorcontrib>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creatorcontrib><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Annotations ; Bioinformatics ; Biological activity ; Construction planning ; Correlation coefficients ; Datasets ; Machine learning ; Neural networks ; Proteins ; Training</subject><ispartof>arXiv.org, 2017-05</ispartof><rights>2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><title>arXiv.org</title><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><subject>Algorithms</subject><subject>Annotations</subject><subject>Bioinformatics</subject><subject>Biological activity</subject><subject>Construction planning</subject><subject>Correlation coefficients</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Proteins</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</recordid><startdate>20170528</startdate><enddate>20170528</enddate><creator>Rifaioglu, Ahmet Sureyya</creator><creator>Tunca Doğan</creator><creator>Maria Jesus Martin</creator><creator>Cetin-Atalay, Rengul</creator><creator>Atalay, Mehmet Volkan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20170528</creationdate><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><author>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20756815023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Annotations</topic><topic>Bioinformatics</topic><topic>Biological activity</topic><topic>Construction planning</topic><topic>Correlation coefficients</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Proteins</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rifaioglu, Ahmet Sureyya</au><au>Tunca Doğan</au><au>Maria Jesus Martin</au><au>Cetin-Atalay, Rengul</au><au>Atalay, Mehmet Volkan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multi-task Deep Neural Networks in Automated Protein Function Prediction</atitle><jtitle>arXiv.org</jtitle><date>2017-05-28</date><risdate>2017</risdate><eissn>2331-8422</eissn><abstract>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2017-05 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2075681502 |
source | Free E- Journals |
subjects | Algorithms Annotations Bioinformatics Biological activity Construction planning Correlation coefficients Datasets Machine learning Neural networks Proteins Training |
title | Multi-task Deep Neural Networks in Automated Protein Function Prediction |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A43%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multi-task%20Deep%20Neural%20Networks%20in%20Automated%20Protein%20Function%20Prediction&rft.jtitle=arXiv.org&rft.au=Rifaioglu,%20Ahmet%20Sureyya&rft.date=2017-05-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2075681502%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2075681502&rft_id=info:pmid/&rfr_iscdi=true |