Multi-task Deep Neural Networks in Automated Protein Function Prediction

In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neura...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2017-05
Hauptverfasser:	Rifaioglu, Ahmet Sureyya, Tunca Doğan, Maria Jesus Martin, Cetin-Atalay, Rengul, Atalay, Mehmet Volkan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Annotations Bioinformatics Biological activity Construction planning Correlation coefficients Datasets Machine learning Neural networks Proteins Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Rifaioglu, Ahmet Sureyya Tunca Doğan Maria Jesus Martin Cetin-Atalay, Rengul Atalay, Mehmet Volkan
description	In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2075681502</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2075681502</sourcerecordid><originalsourceid>FETCH-proquest_journals_20756815023</originalsourceid><addsrcrecordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2075681502</pqid></control><display><type>article</type><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><source>Free E- Journals</source><creator>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creator><creatorcontrib>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</creatorcontrib><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Annotations ; Bioinformatics ; Biological activity ; Construction planning ; Correlation coefficients ; Datasets ; Machine learning ; Neural networks ; Proteins ; Training</subject><ispartof>arXiv.org, 2017-05</ispartof><rights>2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><title>arXiv.org</title><description>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</description><subject>Algorithms</subject><subject>Annotations</subject><subject>Bioinformatics</subject><subject>Biological activity</subject><subject>Construction planning</subject><subject>Correlation coefficients</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Proteins</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNikEKwjAUBYMgWLR3CLgupIlpuxW1dKO4cF-C_ULamNTkB69vEA_gah4zb0EyLkRZNDvOVyQPYWSM8armUoqMdOdoUBeowkSPADO9QPTKJODb-SlQbek-onsqhIFevUNIpo32jtrZJGDQ37khy4cyAfIf12Tbnm6Hrpi9e0UI2I8ueptSz1ktq6aUjIv_Xh8yXTv1</recordid><startdate>20170528</startdate><enddate>20170528</enddate><creator>Rifaioglu, Ahmet Sureyya</creator><creator>Tunca Doğan</creator><creator>Maria Jesus Martin</creator><creator>Cetin-Atalay, Rengul</creator><creator>Atalay, Mehmet Volkan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20170528</creationdate><title>Multi-task Deep Neural Networks in Automated Protein Function Prediction</title><author>Rifaioglu, Ahmet Sureyya ; Tunca Doğan ; Maria Jesus Martin ; Cetin-Atalay, Rengul ; Atalay, Mehmet Volkan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20756815023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Annotations</topic><topic>Bioinformatics</topic><topic>Biological activity</topic><topic>Construction planning</topic><topic>Correlation coefficients</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Proteins</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Rifaioglu, Ahmet Sureyya</creatorcontrib><creatorcontrib>Tunca Doğan</creatorcontrib><creatorcontrib>Maria Jesus Martin</creatorcontrib><creatorcontrib>Cetin-Atalay, Rengul</creatorcontrib><creatorcontrib>Atalay, Mehmet Volkan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rifaioglu, Ahmet Sureyya</au><au>Tunca Doğan</au><au>Maria Jesus Martin</au><au>Cetin-Atalay, Rengul</au><au>Atalay, Mehmet Volkan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multi-task Deep Neural Networks in Automated Protein Function Prediction</atitle><jtitle>arXiv.org</jtitle><date>2017-05-28</date><risdate>2017</risdate><eissn>2331-8422</eissn><abstract>In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2017-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2075681502
source	Free E- Journals
subjects	Algorithms Annotations Bioinformatics Biological activity Construction planning Correlation coefficients Datasets Machine learning Neural networks Proteins Training
title	Multi-task Deep Neural Networks in Automated Protein Function Prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A43%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multi-task%20Deep%20Neural%20Networks%20in%20Automated%20Protein%20Function%20Prediction&rft.jtitle=arXiv.org&rft.au=Rifaioglu,%20Ahmet%20Sureyya&rft.date=2017-05-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2075681502%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2075681502&rft_id=info:pmid/&rfr_iscdi=true