Effective Gene Expression Prediction and Optimization from Protein Sequences

High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection bet...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advanced science 2025-01, p.e2407664
Hauptverfasser: Liu, Tuoyu, Zhang, Yiyang, Li, Yanjun, Xu, Guoshun, Gao, Han, Wang, Pengtao, Tu, Tao, Luo, Huiying, Wu, Ningfeng, Yao, Bin, Liu, Bo, Guan, Feifei, Huang, Huoqing, Tian, Jian
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page e2407664
container_title Advanced science
container_volume
creator Liu, Tuoyu
Zhang, Yiyang
Li, Yanjun
Xu, Guoshun
Gao, Han
Wang, Pengtao
Tu, Tao
Luo, Huiying
Wu, Ningfeng
Yao, Bin
Liu, Bo
Guan, Feifei
Huang, Huoqing
Tian, Jian
description High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.
doi_str_mv 10.1002/advs.202407664
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3153915479</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3153915479</sourcerecordid><originalsourceid>FETCH-LOGICAL-c220t-a6168ad1426d280a03c4a57456fc8d9aad671468d3a6e16087b590f758dc9b43</originalsourceid><addsrcrecordid>eNpNkMtLAzEQxoMottRePcoevWzNa_M4SlmrUKhg70uazEKk-zDZFvWvN7W1eJqPmd98w3wI3RI8IxjTB-P2cUYx5VgKwS_QmBKtcqY4v_ynR2ga4zvGmBRMcqKu0YhpqZhmdIyWZV2DHfwesgW0kJWffYAYfddmrwGcT6MkTeuyVT_4xn-b30YduiYB3QC-zd7gYwethXiDrmqzjTA91QlaP5Xr-XO-XC1e5o_L3FKKh9wIIpRxhFPhqMIGM8tNIXkhaqucNsYJSbhQjhkBRGAlN4XGtSyUs3rD2QTdH2370KXLcagaHy1st6aFbhcrlh7VpOBSJ3R2RG3oYgxQV33wjQlfFcHVIcPqkGF1zjAt3J28d5sG3Bn_S4z9AJvgbLY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3153915479</pqid></control><display><type>article</type><title>Effective Gene Expression Prediction and Optimization from Protein Sequences</title><source>Wiley Online Library Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>Wiley Online Library Journals Frontfile Complete</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Liu, Tuoyu ; Zhang, Yiyang ; Li, Yanjun ; Xu, Guoshun ; Gao, Han ; Wang, Pengtao ; Tu, Tao ; Luo, Huiying ; Wu, Ningfeng ; Yao, Bin ; Liu, Bo ; Guan, Feifei ; Huang, Huoqing ; Tian, Jian</creator><creatorcontrib>Liu, Tuoyu ; Zhang, Yiyang ; Li, Yanjun ; Xu, Guoshun ; Gao, Han ; Wang, Pengtao ; Tu, Tao ; Luo, Huiying ; Wu, Ningfeng ; Yao, Bin ; Liu, Bo ; Guan, Feifei ; Huang, Huoqing ; Tian, Jian</creatorcontrib><description>High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.</description><identifier>ISSN: 2198-3844</identifier><identifier>EISSN: 2198-3844</identifier><identifier>DOI: 10.1002/advs.202407664</identifier><identifier>PMID: 39783932</identifier><language>eng</language><publisher>Germany</publisher><ispartof>Advanced science, 2025-01, p.e2407664</ispartof><rights>2025 The Author(s). Advanced Science published by Wiley‐VCH GmbH.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c220t-a6168ad1426d280a03c4a57456fc8d9aad671468d3a6e16087b590f758dc9b43</cites><orcidid>0000-0002-9997-6518</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39783932$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Tuoyu</creatorcontrib><creatorcontrib>Zhang, Yiyang</creatorcontrib><creatorcontrib>Li, Yanjun</creatorcontrib><creatorcontrib>Xu, Guoshun</creatorcontrib><creatorcontrib>Gao, Han</creatorcontrib><creatorcontrib>Wang, Pengtao</creatorcontrib><creatorcontrib>Tu, Tao</creatorcontrib><creatorcontrib>Luo, Huiying</creatorcontrib><creatorcontrib>Wu, Ningfeng</creatorcontrib><creatorcontrib>Yao, Bin</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><creatorcontrib>Guan, Feifei</creatorcontrib><creatorcontrib>Huang, Huoqing</creatorcontrib><creatorcontrib>Tian, Jian</creatorcontrib><title>Effective Gene Expression Prediction and Optimization from Protein Sequences</title><title>Advanced science</title><addtitle>Adv Sci (Weinh)</addtitle><description>High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.</description><issn>2198-3844</issn><issn>2198-3844</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNpNkMtLAzEQxoMottRePcoevWzNa_M4SlmrUKhg70uazEKk-zDZFvWvN7W1eJqPmd98w3wI3RI8IxjTB-P2cUYx5VgKwS_QmBKtcqY4v_ynR2ga4zvGmBRMcqKu0YhpqZhmdIyWZV2DHfwesgW0kJWffYAYfddmrwGcT6MkTeuyVT_4xn-b30YduiYB3QC-zd7gYwethXiDrmqzjTA91QlaP5Xr-XO-XC1e5o_L3FKKh9wIIpRxhFPhqMIGM8tNIXkhaqucNsYJSbhQjhkBRGAlN4XGtSyUs3rD2QTdH2370KXLcagaHy1st6aFbhcrlh7VpOBSJ3R2RG3oYgxQV33wjQlfFcHVIcPqkGF1zjAt3J28d5sG3Bn_S4z9AJvgbLY</recordid><startdate>20250109</startdate><enddate>20250109</enddate><creator>Liu, Tuoyu</creator><creator>Zhang, Yiyang</creator><creator>Li, Yanjun</creator><creator>Xu, Guoshun</creator><creator>Gao, Han</creator><creator>Wang, Pengtao</creator><creator>Tu, Tao</creator><creator>Luo, Huiying</creator><creator>Wu, Ningfeng</creator><creator>Yao, Bin</creator><creator>Liu, Bo</creator><creator>Guan, Feifei</creator><creator>Huang, Huoqing</creator><creator>Tian, Jian</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-9997-6518</orcidid></search><sort><creationdate>20250109</creationdate><title>Effective Gene Expression Prediction and Optimization from Protein Sequences</title><author>Liu, Tuoyu ; Zhang, Yiyang ; Li, Yanjun ; Xu, Guoshun ; Gao, Han ; Wang, Pengtao ; Tu, Tao ; Luo, Huiying ; Wu, Ningfeng ; Yao, Bin ; Liu, Bo ; Guan, Feifei ; Huang, Huoqing ; Tian, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c220t-a6168ad1426d280a03c4a57456fc8d9aad671468d3a6e16087b590f758dc9b43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Tuoyu</creatorcontrib><creatorcontrib>Zhang, Yiyang</creatorcontrib><creatorcontrib>Li, Yanjun</creatorcontrib><creatorcontrib>Xu, Guoshun</creatorcontrib><creatorcontrib>Gao, Han</creatorcontrib><creatorcontrib>Wang, Pengtao</creatorcontrib><creatorcontrib>Tu, Tao</creatorcontrib><creatorcontrib>Luo, Huiying</creatorcontrib><creatorcontrib>Wu, Ningfeng</creatorcontrib><creatorcontrib>Yao, Bin</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><creatorcontrib>Guan, Feifei</creatorcontrib><creatorcontrib>Huang, Huoqing</creatorcontrib><creatorcontrib>Tian, Jian</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Advanced science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Tuoyu</au><au>Zhang, Yiyang</au><au>Li, Yanjun</au><au>Xu, Guoshun</au><au>Gao, Han</au><au>Wang, Pengtao</au><au>Tu, Tao</au><au>Luo, Huiying</au><au>Wu, Ningfeng</au><au>Yao, Bin</au><au>Liu, Bo</au><au>Guan, Feifei</au><au>Huang, Huoqing</au><au>Tian, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effective Gene Expression Prediction and Optimization from Protein Sequences</atitle><jtitle>Advanced science</jtitle><addtitle>Adv Sci (Weinh)</addtitle><date>2025-01-09</date><risdate>2025</risdate><spage>e2407664</spage><pages>e2407664-</pages><issn>2198-3844</issn><eissn>2198-3844</eissn><abstract>High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.</abstract><cop>Germany</cop><pmid>39783932</pmid><doi>10.1002/advs.202407664</doi><orcidid>https://orcid.org/0000-0002-9997-6518</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2198-3844
ispartof Advanced science, 2025-01, p.e2407664
issn 2198-3844
2198-3844
language eng
recordid cdi_proquest_miscellaneous_3153915479
source Wiley Online Library Open Access; DOAJ Directory of Open Access Journals; Wiley Online Library Journals Frontfile Complete; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
title Effective Gene Expression Prediction and Optimization from Protein Sequences
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T10%3A43%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effective%20Gene%20Expression%20Prediction%20and%20Optimization%20from%20Protein%20Sequences&rft.jtitle=Advanced%20science&rft.au=Liu,%20Tuoyu&rft.date=2025-01-09&rft.spage=e2407664&rft.pages=e2407664-&rft.issn=2198-3844&rft.eissn=2198-3844&rft_id=info:doi/10.1002/advs.202407664&rft_dat=%3Cproquest_cross%3E3153915479%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3153915479&rft_id=info:pmid/39783932&rfr_iscdi=true