Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of grid computing 2019-06, Vol.17 (2), p.225-237
Hauptverfasser:	Nauman, Mohammad, Ur Rehman, Hafeez, Politano, Gianfranco, Benso, Alfredo
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Architecture Computer Science Deep learning Feature extraction Homology Machine learning Management of Computing and Information Systems Model accuracy Molecular biology Processor Architectures Proteins User Interfaces and Human Computer Interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	237
container_issue	2
container_start_page	225
container_title	Journal of grid computing
container_volume	17
creator	Nauman, Mohammad Ur Rehman, Hafeez Politano, Gianfranco Benso, Alfredo
description	Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.
doi_str_mv	10.1007/s10723-018-9450-6
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2255490120</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2255490120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-472a8931adb80d181960e0b7cd91cfdde8a28bfdd434fc800fd16898036338fb3</originalsourceid><addsrcrecordid>eNp1kD1PwzAURS0EEqXwA9gsMRves_PhsJVSKFIkGMpsObFdpWrtYqdD_z0pQWJiene45z7pEHKLcI8A5UNCKLlggJJVWQ6sOCMTzEvOKpTZ-U8GVspSXJKrlDYAPJfAJ6R-ssfgDV2GXdiG9ZGuovbJ2fhIn63d09rq6Du_pi5EOjv0Yad7a-jM-9DrvgueBkc_Yuht59M1uXB6m-zN752Sz5fFar5k9fvr23xWs1Zg0bOs5FpWArVpJBiUWBVgoSlbU2HrjLFSc9kMIROZayWAM1jISoIohJCuEVNyN-7uY_g62NSrTThEP7xUnOd5VgFyGFo4ttoYUorWqX3sdjoeFYI6SVOjNDVIUydpqhgYPjJp6Pq1jX_L_0Pfk8duuA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2255490120</pqid></control><display><type>article</type><title>Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins</title><source>SpringerLink Journals</source><creator>Nauman, Mohammad ; Ur Rehman, Hafeez ; Politano, Gianfranco ; Benso, Alfredo</creator><creatorcontrib>Nauman, Mohammad ; Ur Rehman, Hafeez ; Politano, Gianfranco ; Benso, Alfredo</creatorcontrib><description>Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.</description><identifier>ISSN: 1570-7873</identifier><identifier>EISSN: 1572-9184</identifier><identifier>DOI: 10.1007/s10723-018-9450-6</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Annotations ; Architecture ; Computer Science ; Deep learning ; Feature extraction ; Homology ; Machine learning ; Management of Computing and Information Systems ; Model accuracy ; Molecular biology ; Processor Architectures ; Proteins ; User Interfaces and Human Computer Interaction</subject><ispartof>Journal of grid computing, 2019-06, Vol.17 (2), p.225-237</ispartof><rights>Springer Nature B.V. 2018</rights><rights>Journal of Grid Computing is a copyright of Springer, (2018). All Rights Reserved.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-472a8931adb80d181960e0b7cd91cfdde8a28bfdd434fc800fd16898036338fb3</citedby><cites>FETCH-LOGICAL-c316t-472a8931adb80d181960e0b7cd91cfdde8a28bfdd434fc800fd16898036338fb3</cites><orcidid>0000-0002-3274-6347</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10723-018-9450-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10723-018-9450-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Nauman, Mohammad</creatorcontrib><creatorcontrib>Ur Rehman, Hafeez</creatorcontrib><creatorcontrib>Politano, Gianfranco</creatorcontrib><creatorcontrib>Benso, Alfredo</creatorcontrib><title>Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins</title><title>Journal of grid computing</title><addtitle>J Grid Computing</addtitle><description>Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.</description><subject>Annotations</subject><subject>Architecture</subject><subject>Computer Science</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Homology</subject><subject>Machine learning</subject><subject>Management of Computing and Information Systems</subject><subject>Model accuracy</subject><subject>Molecular biology</subject><subject>Processor Architectures</subject><subject>Proteins</subject><subject>User Interfaces and Human Computer Interaction</subject><issn>1570-7873</issn><issn>1572-9184</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp1kD1PwzAURS0EEqXwA9gsMRves_PhsJVSKFIkGMpsObFdpWrtYqdD_z0pQWJiene45z7pEHKLcI8A5UNCKLlggJJVWQ6sOCMTzEvOKpTZ-U8GVspSXJKrlDYAPJfAJ6R-ssfgDV2GXdiG9ZGuovbJ2fhIn63d09rq6Du_pi5EOjv0Yad7a-jM-9DrvgueBkc_Yuht59M1uXB6m-zN752Sz5fFar5k9fvr23xWs1Zg0bOs5FpWArVpJBiUWBVgoSlbU2HrjLFSc9kMIROZayWAM1jISoIohJCuEVNyN-7uY_g62NSrTThEP7xUnOd5VgFyGFo4ttoYUorWqX3sdjoeFYI6SVOjNDVIUydpqhgYPjJp6Pq1jX_L_0Pfk8duuA</recordid><startdate>20190601</startdate><enddate>20190601</enddate><creator>Nauman, Mohammad</creator><creator>Ur Rehman, Hafeez</creator><creator>Politano, Gianfranco</creator><creator>Benso, Alfredo</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-3274-6347</orcidid></search><sort><creationdate>20190601</creationdate><title>Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins</title><author>Nauman, Mohammad ; Ur Rehman, Hafeez ; Politano, Gianfranco ; Benso, Alfredo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-472a8931adb80d181960e0b7cd91cfdde8a28bfdd434fc800fd16898036338fb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Annotations</topic><topic>Architecture</topic><topic>Computer Science</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Homology</topic><topic>Machine learning</topic><topic>Management of Computing and Information Systems</topic><topic>Model accuracy</topic><topic>Molecular biology</topic><topic>Processor Architectures</topic><topic>Proteins</topic><topic>User Interfaces and Human Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Nauman, Mohammad</creatorcontrib><creatorcontrib>Ur Rehman, Hafeez</creatorcontrib><creatorcontrib>Politano, Gianfranco</creatorcontrib><creatorcontrib>Benso, Alfredo</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Journal of grid computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nauman, Mohammad</au><au>Ur Rehman, Hafeez</au><au>Politano, Gianfranco</au><au>Benso, Alfredo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins</atitle><jtitle>Journal of grid computing</jtitle><stitle>J Grid Computing</stitle><date>2019-06-01</date><risdate>2019</risdate><volume>17</volume><issue>2</issue><spage>225</spage><epage>237</epage><pages>225-237</pages><issn>1570-7873</issn><eissn>1572-9184</eissn><abstract>Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10723-018-9450-6</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-3274-6347</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1570-7873
ispartof	Journal of grid computing, 2019-06, Vol.17 (2), p.225-237
issn	1570-7873 1572-9184
language	eng
recordid	cdi_proquest_journals_2255490120
source	SpringerLink Journals
subjects	Annotations Architecture Computer Science Deep learning Feature extraction Homology Machine learning Management of Computing and Information Systems Model accuracy Molecular biology Processor Architectures Proteins User Interfaces and Human Computer Interaction
title	Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T05%3A43%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Beyond%20Homology%20Transfer:%20Deep%20Learning%20for%20Automated%20Annotation%20of%20Proteins&rft.jtitle=Journal%20of%20grid%20computing&rft.au=Nauman,%20Mohammad&rft.date=2019-06-01&rft.volume=17&rft.issue=2&rft.spage=225&rft.epage=237&rft.pages=225-237&rft.issn=1570-7873&rft.eissn=1572-9184&rft_id=info:doi/10.1007/s10723-018-9450-6&rft_dat=%3Cproquest_cross%3E2255490120%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2255490120&rft_id=info:pmid/&rfr_iscdi=true