ProteinUnet—An efficient alternative to SPIDER3‐single for sequence‐based prediction of protein secondary structures
Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the incr...
Gespeichert in:
Veröffentlicht in: | Journal of computational chemistry 2021-01, Vol.42 (1), p.50-59 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures.
ProteinUnet is the first model that successfully leverages U‐Net deep learning architecture for sequence‐based protein one‐dimensional structural properties prediction. It achieves comparable results to SPIDER3‐Single model based on long short‐term memory‐bidirectional recurrent neural networks architecture, while having two times fewer parameters, training 11 times shorter, and predicting 17 times faster. Moreover, ProteinUnet shows better results for short sequences and residues with a low number of local contacts. |
---|---|
ISSN: | 0192-8651 1096-987X |
DOI: | 10.1002/jcc.26432 |