Optimizing Protein Fitness and Function with Sparse Experimental Data
The quest to create customized protein sequences with specific functions holds great promise across diverse fields, from healthcare to sustainable energy. While Next Generation Sequencing (NGS) allows for experimental evaluation of millions of protein sequences, it is dwarfed by the vast residue pos...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The quest to create customized protein sequences with specific functions holds great promise
across diverse fields, from healthcare to sustainable energy. While Next Generation Sequencing
(NGS) allows for experimental evaluation of millions of protein sequences, it is dwarfed by the vast
residue possibility space. Recent advances in unsupervised generative models offer potential solutions,
yet they need comprehensive evaluation on their generalizability to different types of data.
This work addresses the biases and limitations of current protein design methods, emphasizing
the importance of systematic evaluation. We explore protein sequence and structure models, particularly
in the context of deep mutational scans.
Chapter 1 investigates the biases of unsupervised protein sequence models and presents a method
to alleviate these biases. This chapter aids in ranking diverse protein sequences, enhancing their
prioritization for testing.
Chapter 2 delves into the predictions of various structure models for mutational effect analysis.
Spatially-local residue preference models are found to prevail in certain cases, guiding local sequence
optimization without additional experimental labor.
Chapter 3 focuses on predicting enzyme pH optima using sequence embeddings from large language
models. This benchmark study enhances our understanding of using unsupervised models to
predict enzyme characteristics.
Chapter 4 explores methods to predict protein function and fitness using sparse and disparate
experimental data, shedding light on leveraging diverse information sources for predictive modeling.
This work underscores the importance of evaluating designs on experimental data while highlighting
the assets of unsupervised models. Future endeavors will involve experimental validation of
the presented ideas. |
---|