ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures

Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the incr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational chemistry 2021-01, Vol.42 (1), p.50-59
Hauptverfasser:	Kotowski, Krzysztof, Smolarczyk, Tomasz, Roterman‐Konieczna, Irena, Stapor, Katarzyna
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy backbone angles estimation Bioinformatics Computer architecture deep learning Homology Predictions protein structure prediction Proteins Recurrent neural networks secondary structure prediction Sequences solvent accessibility prediction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	59
container_issue	1
container_start_page	50
container_title	Journal of computational chemistry
container_volume	42
creator	Kotowski, Krzysztof Smolarczyk, Tomasz Roterman‐Konieczna, Irena Stapor, Katarzyna
description	Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures. ProteinUnet is the first model that successfully leverages U‐Net deep learning architecture for sequence‐based protein one‐dimensional structural properties prediction. It achieves comparable results to SPIDER3‐Single model based on long short‐term memory‐bidirectional recurrent neural networks architecture, while having two times fewer parameters, training 11 times shorter, and predicting 17 times faster. Moreover, ProteinUnet shows better results for short sequences and residues with a low number of local contacts.
doi_str_mv	10.1002/jcc.26432
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7756333</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2451852676</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3582-2872d6290446b9431d860b33f52146f1b257294421530ca06479bbaa5d52dc093</originalsourceid><addsrcrecordid>eNp1kc1qFTEYhoMo9nh04Q1IwI0ups3_zGyEcqy1UmhRC-5CJvNNzWFOcppkKnXVS3DRK-yVmPbUUgWzCUkeHt4vL0IvKdmmhLCdpbXbTAnOHqEZJa2q2qb-9hjNCG1Z1ShJt9CzlJaEEC6VeIq2OCeyYYrO0M_jGDI4f-IhX19e7XoMw-CsA5-xGTNEb7I7B5wD_nJ88H7vM7--_JWcPx0BDyHiBGcTeAvltjMJeryO0DubXfA4DOV0ay-YDb438QKnHCebpwjpOXoymDHBi7t9jk4-7H1dfKwOj_YPFruHleUlZMWamvWKtUQI1bWC075RpON8kIwKNdCOyZq1QjAqObGGKFG3XWeM7CXrLWn5HL3beNdTt4LeltGiGfU6ulUJpINx-u8X777r03Cu61oqXtYcvbkTxFCmTVmvXLIwjsZDmJJmQtJGMlWrgr7-B12GqfzheEOVhjiXjBTq7YayMaQUYbgPQ4m-aVSXRvVto4V99TD9PfmnwgLsbIAfboSL_5v0p8Vio_wNp3OuPQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2464333520</pqid></control><display><type>article</type><title>ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures</title><source>Access via Wiley Online Library</source><creator>Kotowski, Krzysztof ; Smolarczyk, Tomasz ; Roterman‐Konieczna, Irena ; Stapor, Katarzyna</creator><creatorcontrib>Kotowski, Krzysztof ; Smolarczyk, Tomasz ; Roterman‐Konieczna, Irena ; Stapor, Katarzyna</creatorcontrib><description>Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures. ProteinUnet is the first model that successfully leverages U‐Net deep learning architecture for sequence‐based protein one‐dimensional structural properties prediction. It achieves comparable results to SPIDER3‐Single model based on long short‐term memory‐bidirectional recurrent neural networks architecture, while having two times fewer parameters, training 11 times shorter, and predicting 17 times faster. Moreover, ProteinUnet shows better results for short sequences and residues with a low number of local contacts.</description><identifier>ISSN: 0192-8651</identifier><identifier>EISSN: 1096-987X</identifier><identifier>DOI: 10.1002/jcc.26432</identifier><identifier>PMID: 33058261</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley & Sons, Inc</publisher><subject>Accuracy ; backbone angles estimation ; Bioinformatics ; Computer architecture ; deep learning ; Homology ; Predictions ; protein structure prediction ; Proteins ; Recurrent neural networks ; secondary structure prediction ; Sequences ; solvent accessibility prediction</subject><ispartof>Journal of computational chemistry, 2021-01, Vol.42 (1), p.50-59</ispartof><rights>2020 The Authors. published by Wiley Periodicals LLC.</rights><rights>2020 The Authors. Journal of Computational Chemistry published by Wiley Periodicals LLC.</rights><rights>2020. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3582-2872d6290446b9431d860b33f52146f1b257294421530ca06479bbaa5d52dc093</citedby><cites>FETCH-LOGICAL-c3582-2872d6290446b9431d860b33f52146f1b257294421530ca06479bbaa5d52dc093</cites><orcidid>0000-0003-3003-6592</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fjcc.26432$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fjcc.26432$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>230,314,780,784,885,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33058261$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kotowski, Krzysztof</creatorcontrib><creatorcontrib>Smolarczyk, Tomasz</creatorcontrib><creatorcontrib>Roterman‐Konieczna, Irena</creatorcontrib><creatorcontrib>Stapor, Katarzyna</creatorcontrib><title>ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures</title><title>Journal of computational chemistry</title><addtitle>J Comput Chem</addtitle><description>Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures. ProteinUnet is the first model that successfully leverages U‐Net deep learning architecture for sequence‐based protein one‐dimensional structural properties prediction. It achieves comparable results to SPIDER3‐Single model based on long short‐term memory‐bidirectional recurrent neural networks architecture, while having two times fewer parameters, training 11 times shorter, and predicting 17 times faster. Moreover, ProteinUnet shows better results for short sequences and residues with a low number of local contacts.</description><subject>Accuracy</subject><subject>backbone angles estimation</subject><subject>Bioinformatics</subject><subject>Computer architecture</subject><subject>deep learning</subject><subject>Homology</subject><subject>Predictions</subject><subject>protein structure prediction</subject><subject>Proteins</subject><subject>Recurrent neural networks</subject><subject>secondary structure prediction</subject><subject>Sequences</subject><subject>solvent accessibility prediction</subject><issn>0192-8651</issn><issn>1096-987X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>WIN</sourceid><recordid>eNp1kc1qFTEYhoMo9nh04Q1IwI0ups3_zGyEcqy1UmhRC-5CJvNNzWFOcppkKnXVS3DRK-yVmPbUUgWzCUkeHt4vL0IvKdmmhLCdpbXbTAnOHqEZJa2q2qb-9hjNCG1Z1ShJt9CzlJaEEC6VeIq2OCeyYYrO0M_jGDI4f-IhX19e7XoMw-CsA5-xGTNEb7I7B5wD_nJ88H7vM7--_JWcPx0BDyHiBGcTeAvltjMJeryO0DubXfA4DOV0ay-YDb438QKnHCebpwjpOXoymDHBi7t9jk4-7H1dfKwOj_YPFruHleUlZMWamvWKtUQI1bWC075RpON8kIwKNdCOyZq1QjAqObGGKFG3XWeM7CXrLWn5HL3beNdTt4LeltGiGfU6ulUJpINx-u8X777r03Cu61oqXtYcvbkTxFCmTVmvXLIwjsZDmJJmQtJGMlWrgr7-B12GqfzheEOVhjiXjBTq7YayMaQUYbgPQ4m-aVSXRvVto4V99TD9PfmnwgLsbIAfboSL_5v0p8Vio_wNp3OuPQ</recordid><startdate>20210105</startdate><enddate>20210105</enddate><creator>Kotowski, Krzysztof</creator><creator>Smolarczyk, Tomasz</creator><creator>Roterman‐Konieczna, Irena</creator><creator>Stapor, Katarzyna</creator><general>John Wiley & Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>WIN</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-3003-6592</orcidid></search><sort><creationdate>20210105</creationdate><title>ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures</title><author>Kotowski, Krzysztof ; Smolarczyk, Tomasz ; Roterman‐Konieczna, Irena ; Stapor, Katarzyna</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3582-2872d6290446b9431d860b33f52146f1b257294421530ca06479bbaa5d52dc093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>backbone angles estimation</topic><topic>Bioinformatics</topic><topic>Computer architecture</topic><topic>deep learning</topic><topic>Homology</topic><topic>Predictions</topic><topic>protein structure prediction</topic><topic>Proteins</topic><topic>Recurrent neural networks</topic><topic>secondary structure prediction</topic><topic>Sequences</topic><topic>solvent accessibility prediction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kotowski, Krzysztof</creatorcontrib><creatorcontrib>Smolarczyk, Tomasz</creatorcontrib><creatorcontrib>Roterman‐Konieczna, Irena</creatorcontrib><creatorcontrib>Stapor, Katarzyna</creatorcontrib><collection>Wiley-Blackwell Open Access Titles</collection><collection>Wiley Online Library (Open Access Collection)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of computational chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kotowski, Krzysztof</au><au>Smolarczyk, Tomasz</au><au>Roterman‐Konieczna, Irena</au><au>Stapor, Katarzyna</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures</atitle><jtitle>Journal of computational chemistry</jtitle><addtitle>J Comput Chem</addtitle><date>2021-01-05</date><risdate>2021</risdate><volume>42</volume><issue>1</issue><spage>50</spage><epage>59</epage><pages>50-59</pages><issn>0192-8651</issn><eissn>1096-987X</eissn><abstract>Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures. ProteinUnet is the first model that successfully leverages U‐Net deep learning architecture for sequence‐based protein one‐dimensional structural properties prediction. It achieves comparable results to SPIDER3‐Single model based on long short‐term memory‐bidirectional recurrent neural networks architecture, while having two times fewer parameters, training 11 times shorter, and predicting 17 times faster. Moreover, ProteinUnet shows better results for short sequences and residues with a low number of local contacts.</abstract><cop>Hoboken, USA</cop><pub>John Wiley & Sons, Inc</pub><pmid>33058261</pmid><doi>10.1002/jcc.26432</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-3003-6592</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0192-8651
ispartof	Journal of computational chemistry, 2021-01, Vol.42 (1), p.50-59
issn	0192-8651 1096-987X
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7756333
source	Access via Wiley Online Library
subjects	Accuracy backbone angles estimation Bioinformatics Computer architecture deep learning Homology Predictions protein structure prediction Proteins Recurrent neural networks secondary structure prediction Sequences solvent accessibility prediction
title	ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T19%3A29%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ProteinUnet%E2%80%94An%20efficient%20alternative%20to%20SPIDER3%E2%80%90single%20for%20sequence%E2%80%90based%20prediction%20of%20protein%20secondary%20structures&rft.jtitle=Journal%20of%20computational%20chemistry&rft.au=Kotowski,%20Krzysztof&rft.date=2021-01-05&rft.volume=42&rft.issue=1&rft.spage=50&rft.epage=59&rft.pages=50-59&rft.issn=0192-8651&rft.eissn=1096-987X&rft_id=info:doi/10.1002/jcc.26432&rft_dat=%3Cproquest_pubme%3E2451852676%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2464333520&rft_id=info:pmid/33058261&rfr_iscdi=true