Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing

Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular cell 2024-04, Vol.84 (7), p.1257-1270.e6
Hauptverfasser: He, Yan, Zhou, Xibin, Chang, Chong, Chen, Ge, Liu, Weikuan, Li, Geng, Fan, Xiaoqi, Sun, Mingsun, Miao, Chensi, Huang, Qianyue, Ma, Yunqing, Yuan, Fajie, Chang, Xing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1270.e6
container_issue 7
container_start_page 1257
container_title Molecular cell
container_volume 84
creator He, Yan
Zhou, Xibin
Chang, Chong
Chen, Ge
Liu, Weikuan
Li, Geng
Fan, Xiaoqi
Sun, Mingsun
Miao, Chensi
Huang, Qianyue
Ma, Yunqing
Yuan, Fajie
Chang, Xing
description Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T>G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data. [Display omitted] •nCas9 with engineered UNGs enable transversion base editing without deamination•PLMs were used to predict enzymatic variant activities•Using the PLMs, an efficient T>S (G or C) base editor, TSBE3, was developed•TSBE3 effectively corrected a diabetic mutation (Leprdb) in murine embryos He et al. utilized protein language models (PLMs) to engineer an enhanced UNG variant, eTDG, targeting thymine. Accurate predictions allowed the validation of over 80% of high-fitness variants. This enabled the development of TSBE3, a tool for efficient T>G or C substitutions in cell lines, T cells, and mouse embryos.
doi_str_mv 10.1016/j.molcel.2024.01.021
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2929541873</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1097276524000881</els_id><sourcerecordid>2929541873</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-24abb929ec91218d9808b0492cdfe777c5d2078f1f5c84ed0d66208c5d843e53</originalsourceid><addsrcrecordid>eNp9kMtu1DAUhiNERUvLGyDkJRsH23ESe4OERqUgVbSL2VuOfRJ55MSD7VQatrw4HmVgyepc9P_n8lXVe0pqSmj36VDPwRvwNSOM14TWhNFX1Q0lssecdvz1JWd9115Xb1M6EEJ5K-Sb6roRTd9L2dxUv59jyOAW5PUyrXoCNAcLPmGdkksZLArH7Gb3S2cXFhRGpNEatXEe_8CTP5mQTl4nQC86Or1kBIsePCR0jGGKep7PFdrjHPAD0ovd0h0azh6wLrtluquuRu0TvLvE22r_9X6_-4Yfnx6-7748YsOJyJhxPQySSTCSMiqsFEQMhEtm7Ah935vWMtKLkY6tERwssV3HiChtwRtom9vq4za2nPZzhZTV7FIBWD6HsCbFyuyWU9E3Rco3qYkhpQijOkY363hSlKgzfXVQG311pq8IVYV-sX24bFiHGew_01_cRfB5ExTC8OIgqmQcLKaAiGCyssH9f8MfjIeZUQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2929541873</pqid></control><display><type>article</type><title>Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing</title><source>Elsevier ScienceDirect Journals</source><creator>He, Yan ; Zhou, Xibin ; Chang, Chong ; Chen, Ge ; Liu, Weikuan ; Li, Geng ; Fan, Xiaoqi ; Sun, Mingsun ; Miao, Chensi ; Huang, Qianyue ; Ma, Yunqing ; Yuan, Fajie ; Chang, Xing</creator><creatorcontrib>He, Yan ; Zhou, Xibin ; Chang, Chong ; Chen, Ge ; Liu, Weikuan ; Li, Geng ; Fan, Xiaoqi ; Sun, Mingsun ; Miao, Chensi ; Huang, Qianyue ; Ma, Yunqing ; Yuan, Fajie ; Chang, Xing</creatorcontrib><description>Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T&gt;G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data. [Display omitted] •nCas9 with engineered UNGs enable transversion base editing without deamination•PLMs were used to predict enzymatic variant activities•Using the PLMs, an efficient T&gt;S (G or C) base editor, TSBE3, was developed•TSBE3 effectively corrected a diabetic mutation (Leprdb) in murine embryos He et al. utilized protein language models (PLMs) to engineer an enhanced UNG variant, eTDG, targeting thymine. Accurate predictions allowed the validation of over 80% of high-fitness variants. This enabled the development of TSBE3, a tool for efficient T&gt;G or C substitutions in cell lines, T cells, and mouse embryos.</description><identifier>ISSN: 1097-2765</identifier><identifier>EISSN: 1097-4164</identifier><identifier>DOI: 10.1016/j.molcel.2024.01.021</identifier><identifier>PMID: 38377993</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>base excision repair ; C&gt;G base editing ; CRISPR ; glycosylase-derived base editor ; protein language models ; T-to-C base editing ; T-to-G base editing</subject><ispartof>Molecular cell, 2024-04, Vol.84 (7), p.1257-1270.e6</ispartof><rights>2024 Elsevier Inc.</rights><rights>Copyright © 2024 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-24abb929ec91218d9808b0492cdfe777c5d2078f1f5c84ed0d66208c5d843e53</citedby><cites>FETCH-LOGICAL-c408t-24abb929ec91218d9808b0492cdfe777c5d2078f1f5c84ed0d66208c5d843e53</cites><orcidid>0000-0002-5072-9225</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1097276524000881$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38377993$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Yan</creatorcontrib><creatorcontrib>Zhou, Xibin</creatorcontrib><creatorcontrib>Chang, Chong</creatorcontrib><creatorcontrib>Chen, Ge</creatorcontrib><creatorcontrib>Liu, Weikuan</creatorcontrib><creatorcontrib>Li, Geng</creatorcontrib><creatorcontrib>Fan, Xiaoqi</creatorcontrib><creatorcontrib>Sun, Mingsun</creatorcontrib><creatorcontrib>Miao, Chensi</creatorcontrib><creatorcontrib>Huang, Qianyue</creatorcontrib><creatorcontrib>Ma, Yunqing</creatorcontrib><creatorcontrib>Yuan, Fajie</creatorcontrib><creatorcontrib>Chang, Xing</creatorcontrib><title>Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing</title><title>Molecular cell</title><addtitle>Mol Cell</addtitle><description>Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T&gt;G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data. [Display omitted] •nCas9 with engineered UNGs enable transversion base editing without deamination•PLMs were used to predict enzymatic variant activities•Using the PLMs, an efficient T&gt;S (G or C) base editor, TSBE3, was developed•TSBE3 effectively corrected a diabetic mutation (Leprdb) in murine embryos He et al. utilized protein language models (PLMs) to engineer an enhanced UNG variant, eTDG, targeting thymine. Accurate predictions allowed the validation of over 80% of high-fitness variants. This enabled the development of TSBE3, a tool for efficient T&gt;G or C substitutions in cell lines, T cells, and mouse embryos.</description><subject>base excision repair</subject><subject>C&gt;G base editing</subject><subject>CRISPR</subject><subject>glycosylase-derived base editor</subject><subject>protein language models</subject><subject>T-to-C base editing</subject><subject>T-to-G base editing</subject><issn>1097-2765</issn><issn>1097-4164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtu1DAUhiNERUvLGyDkJRsH23ESe4OERqUgVbSL2VuOfRJ55MSD7VQatrw4HmVgyepc9P_n8lXVe0pqSmj36VDPwRvwNSOM14TWhNFX1Q0lssecdvz1JWd9115Xb1M6EEJ5K-Sb6roRTd9L2dxUv59jyOAW5PUyrXoCNAcLPmGdkksZLArH7Gb3S2cXFhRGpNEatXEe_8CTP5mQTl4nQC86Or1kBIsePCR0jGGKep7PFdrjHPAD0ovd0h0azh6wLrtluquuRu0TvLvE22r_9X6_-4Yfnx6-7748YsOJyJhxPQySSTCSMiqsFEQMhEtm7Ah935vWMtKLkY6tERwssV3HiChtwRtom9vq4za2nPZzhZTV7FIBWD6HsCbFyuyWU9E3Rco3qYkhpQijOkY363hSlKgzfXVQG311pq8IVYV-sX24bFiHGew_01_cRfB5ExTC8OIgqmQcLKaAiGCyssH9f8MfjIeZUQ</recordid><startdate>20240404</startdate><enddate>20240404</enddate><creator>He, Yan</creator><creator>Zhou, Xibin</creator><creator>Chang, Chong</creator><creator>Chen, Ge</creator><creator>Liu, Weikuan</creator><creator>Li, Geng</creator><creator>Fan, Xiaoqi</creator><creator>Sun, Mingsun</creator><creator>Miao, Chensi</creator><creator>Huang, Qianyue</creator><creator>Ma, Yunqing</creator><creator>Yuan, Fajie</creator><creator>Chang, Xing</creator><general>Elsevier Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5072-9225</orcidid></search><sort><creationdate>20240404</creationdate><title>Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing</title><author>He, Yan ; Zhou, Xibin ; Chang, Chong ; Chen, Ge ; Liu, Weikuan ; Li, Geng ; Fan, Xiaoqi ; Sun, Mingsun ; Miao, Chensi ; Huang, Qianyue ; Ma, Yunqing ; Yuan, Fajie ; Chang, Xing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-24abb929ec91218d9808b0492cdfe777c5d2078f1f5c84ed0d66208c5d843e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>base excision repair</topic><topic>C&gt;G base editing</topic><topic>CRISPR</topic><topic>glycosylase-derived base editor</topic><topic>protein language models</topic><topic>T-to-C base editing</topic><topic>T-to-G base editing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>He, Yan</creatorcontrib><creatorcontrib>Zhou, Xibin</creatorcontrib><creatorcontrib>Chang, Chong</creatorcontrib><creatorcontrib>Chen, Ge</creatorcontrib><creatorcontrib>Liu, Weikuan</creatorcontrib><creatorcontrib>Li, Geng</creatorcontrib><creatorcontrib>Fan, Xiaoqi</creatorcontrib><creatorcontrib>Sun, Mingsun</creatorcontrib><creatorcontrib>Miao, Chensi</creatorcontrib><creatorcontrib>Huang, Qianyue</creatorcontrib><creatorcontrib>Ma, Yunqing</creatorcontrib><creatorcontrib>Yuan, Fajie</creatorcontrib><creatorcontrib>Chang, Xing</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Molecular cell</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>He, Yan</au><au>Zhou, Xibin</au><au>Chang, Chong</au><au>Chen, Ge</au><au>Liu, Weikuan</au><au>Li, Geng</au><au>Fan, Xiaoqi</au><au>Sun, Mingsun</au><au>Miao, Chensi</au><au>Huang, Qianyue</au><au>Ma, Yunqing</au><au>Yuan, Fajie</au><au>Chang, Xing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing</atitle><jtitle>Molecular cell</jtitle><addtitle>Mol Cell</addtitle><date>2024-04-04</date><risdate>2024</risdate><volume>84</volume><issue>7</issue><spage>1257</spage><epage>1270.e6</epage><pages>1257-1270.e6</pages><issn>1097-2765</issn><eissn>1097-4164</eissn><abstract>Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T&gt;G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data. [Display omitted] •nCas9 with engineered UNGs enable transversion base editing without deamination•PLMs were used to predict enzymatic variant activities•Using the PLMs, an efficient T&gt;S (G or C) base editor, TSBE3, was developed•TSBE3 effectively corrected a diabetic mutation (Leprdb) in murine embryos He et al. utilized protein language models (PLMs) to engineer an enhanced UNG variant, eTDG, targeting thymine. Accurate predictions allowed the validation of over 80% of high-fitness variants. This enabled the development of TSBE3, a tool for efficient T&gt;G or C substitutions in cell lines, T cells, and mouse embryos.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>38377993</pmid><doi>10.1016/j.molcel.2024.01.021</doi><orcidid>https://orcid.org/0000-0002-5072-9225</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1097-2765
ispartof Molecular cell, 2024-04, Vol.84 (7), p.1257-1270.e6
issn 1097-2765
1097-4164
language eng
recordid cdi_proquest_miscellaneous_2929541873
source Elsevier ScienceDirect Journals
subjects base excision repair
C>G base editing
CRISPR
glycosylase-derived base editor
protein language models
T-to-C base editing
T-to-G base editing
title Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T15%3A00%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Protein%20language%20models-assisted%20optimization%20of%20a%20uracil-N-glycosylase%20variant%20enables%20programmable%20T-to-G%20and%20T-to-C%20base%20editing&rft.jtitle=Molecular%20cell&rft.au=He,%20Yan&rft.date=2024-04-04&rft.volume=84&rft.issue=7&rft.spage=1257&rft.epage=1270.e6&rft.pages=1257-1270.e6&rft.issn=1097-2765&rft.eissn=1097-4164&rft_id=info:doi/10.1016/j.molcel.2024.01.021&rft_dat=%3Cproquest_cross%3E2929541873%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2929541873&rft_id=info:pmid/38377993&rft_els_id=S1097276524000881&rfr_iscdi=true