Multi-task Deep Neural Networks in Automated Protein Function Prediction

In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neura...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2017-05
Hauptverfasser: Rifaioglu, Ahmet Sureyya, Tunca Doğan, Maria Jesus Martin, Cetin-Atalay, Rengul, Atalay, Mehmet Volkan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Rifaioglu, Ahmet Sureyya
Tunca Doğan
Maria Jesus Martin
Cetin-Atalay, Rengul
Atalay, Mehmet Volkan
description In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2075681502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2075681502</sourcerecordid><originalsourceid>FETCH-proquest_journals_20756815023</originalsourceid><addsrcrecordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2075681502</pqid></control><display><type>article</type><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><source>Free E- Journals</source><creator>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creator><creatorcontrib>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creatorcontrib><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Annotations ; Bioinformatics ; Biological activity ; Construction planning ; Correlation coefficients ; Datasets ; Machine learning ; Neural networks ; Proteins ; Training</subject><ispartof>arXiv.org, 2017-05</ispartof><rights>2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><title>arXiv.org</title><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><subject>Algorithms</subject><subject>Annotations</subject><subject>Bioinformatics</subject><subject>Biological activity</subject><subject>Construction planning</subject><subject>Correlation coefficients</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Proteins</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</recordid><startdate>20170528</startdate><enddate>20170528</enddate><creator>Rifaioglu, Ahmet Sureyya</creator><creator>Tunca Doğan</creator><creator>Maria Jesus Martin</creator><creator>Cetin-Atalay, Rengul</creator><creator>Atalay, Mehmet Volkan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20170528</creationdate><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><author>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20756815023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Annotations</topic><topic>Bioinformatics</topic><topic>Biological activity</topic><topic>Construction planning</topic><topic>Correlation coefficients</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Proteins</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rifaioglu, Ahmet Sureyya</au><au>Tunca Doğan</au><au>Maria Jesus Martin</au><au>Cetin-Atalay, Rengul</au><au>Atalay, Mehmet Volkan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multi-task Deep Neural Networks in Automated Protein Function Prediction</atitle><jtitle>arXiv.org</jtitle><date>2017-05-28</date><risdate>2017</risdate><eissn>2331-8422</eissn><abstract>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2017-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_2075681502
source Free E- Journals
subjects Algorithms
Annotations
Bioinformatics
Biological activity
Construction planning
Correlation coefficients
Datasets
Machine learning
Neural networks
Proteins
Training
title Multi-task Deep Neural Networks in Automated Protein Function Prediction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A43%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multi-task%20Deep%20Neural%20Networks%20in%20Automated%20Protein%20Function%20Prediction&rft.jtitle=arXiv.org&rft.au=Rifaioglu,%20Ahmet%20Sureyya&rft.date=2017-05-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2075681502%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2075681502&rft_id=info:pmid/&rfr_iscdi=true