SoluProtMutDB: A manually curated database of protein solubility changes upon mutations

•Solubility is a fundamental characteristic of any protein and an essential factor in its production, yet not well predictable.•SoluProtMutDB is the first protein database of mutational solubility data with ∼33 000 entries available for machine learning.•The data are manually curated against the sou...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational and structural biotechnology journal 2022-01, Vol.20, p.6339-6347
Hauptverfasser: Velecký, Jan, Hamsikova, Marie, Stourac, Jan, Musil, Milos, Damborsky, Jiri, Bednar, David, Mazurenko, Stanislav
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Solubility is a fundamental characteristic of any protein and an essential factor in its production, yet not well predictable.•SoluProtMutDB is the first protein database of mutational solubility data with ∼33 000 entries available for machine learning.•The data are manually curated against the source literature and cross-linked to other biological databases.•A modern user-friendly web interface following the FAIR principles allows browsing and searching the database online.•The database may serve for reporting negative results from solubilization experiments, thus improving solubility predictors. Protein solubility is an attractive engineering target primarily due to its relation to yields in protein production and manufacturing. Moreover, better knowledge of the mutational effects on protein solubility could connect several serious human diseases with protein aggregation. However, we have limited understanding of the protein structural determinants of solubility, and the available data have mostly been scattered in the literature. Here, we present SoluProtMutDB – the first database containing data on protein solubility changes upon mutations. Our database accommodates 33000 measurements of 17000 protein variants in 103 different proteins. The database can serve as an essential source of information for the researchers designing improved protein variants or those developing machine learning tools to predict the effects of mutations on solubility. The database comprises all the previously published solubility datasets and thousands of new data points from recent publications, including deep mutational scanning experiments. Moreover, it features many available experimental conditions known to affect protein solubility. The datasets have been manually curated with substantial corrections, improving suitability for machine learning applications. The database is available at loschmidt.chemi.muni.cz/soluprotmutdb.
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2022.11.009