A protein fitness predictive framework based on feature combination and intelligent searching

Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local optima, like directional evolution. However, how to extract t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Protein science 2024-12, Vol.33 (12), p.e5211-n/a
Hauptverfasser: Zhang, Zhihui, Li, Zhixuan, Wang, Qianyue, Wu, Hanlin, Yang, Manli, Zhao, Fengguang, Tan, Mingkui, Han, Shuangyan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local optima, like directional evolution. However, how to extract the most pertinent functional feature information from a limited number of protein sequences is vital for optimizing the performance of ML models. Here, we propose scut_ProFP (Protein Fitness Predictor), a predictive framework that integrates feature combination and feature selection techniques. Feature combination offers comprehensive sequence information, while feature selection searches for the most beneficial features to enhance model performance, enabling accurate sequence‐to‐function mapping. Compared to similar frameworks, scut_ProFP demonstrates superior performance and is also competitive with more complex deep learning models—ECNet, EVmutation, and UniRep. In addition, scut_ProFP enables generalization from low‐order mutants to high‐order mutants. Finally, we utilized scut_ProFP to simulate the engineering of the fluorescent protein CreiLOV and highly enriched mutants with high fluorescence based on only a small number of low‐fluorescence mutants. Essentially, the developed method is advantageous for ML in protein engineering, providing an effective approach to data‐driven protein engineering. The code and datasets for scut_ProFP are available at https://github.com/Zhang66-star/scut_ProFP.
ISSN:0961-8368
1469-896X
1469-896X
DOI:10.1002/pro.5211