Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily

Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Synthetic Biology 2020-01, Vol.5 (1), p.1
Hauptverfasser: Robinson, Serina L, Smith, Megan D, Richman, Jack E, Aukema, Kelly G, Wackett, Lawrence P
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 1
container_title Synthetic Biology
container_volume 5
creator Robinson, Serina L
Smith, Megan D
Richman, Jack E
Aukema, Kelly G
Wackett, Lawrence P
description Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.
doi_str_mv 10.1093/synbio/ysaa004
format Article
fullrecord <record><control><sourceid>gale_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1093_synbio_ysaa004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A688079754</galeid><sourcerecordid>A688079754</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-d6d42369371083e052e31520586b70783376827329946cfabcc59453079ddc5c3</originalsourceid><addsrcrecordid>eNpNkE9rwzAMxc3YYKXrdWd_gbROHMfJsZT9g45etnNwbLnVSOxgZ4Ps089lhQ0h9JB4D_Ej5D5n65w1fBNn16HfzFEpxsorsih4IzPJGLv-p2_JKsaPJHIpRM6LBZlelT6hA9qDCg7dMetUBEPHAAb1hN5Rb6lK6gunmSpnaPzs4hTUBDSOoNGiPl-sD_TQw5aC-54HiBQdnU6QGn2fIpNthGDVgP18R26s6iOsLnNJ3h8f3nbP2f7w9LLb7jPNazZlpjJlwauGy5zVHJgogOeiYKKuOslkzbms6kLyomnKSlvVaS2aUnAmG2O00HxJ1r-5R9VDi8769LdOZWBA7R1YTPttVdfJIkX5Z9DBxxjAtmPAQYW5zVl75tz-cm4vnPkPmpF0Tg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily</title><source>Oxford Journals Open Access Collection</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Robinson, Serina L ; Smith, Megan D ; Richman, Jack E ; Aukema, Kelly G ; Wackett, Lawrence P</creator><creatorcontrib>Robinson, Serina L ; Smith, Megan D ; Richman, Jack E ; Aukema, Kelly G ; Wackett, Lawrence P</creatorcontrib><description>Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.</description><identifier>ISSN: 2397-7000</identifier><identifier>EISSN: 2397-7000</identifier><identifier>DOI: 10.1093/synbio/ysaa004</identifier><language>eng</language><publisher>Oxford University Press</publisher><subject>Amino acids ; Enzymes ; Esters ; Genomics ; Machine learning ; Membrane lipids ; Personal care industry ; Physiological aspects ; Plant metabolites ; Toiletries</subject><ispartof>Synthetic Biology, 2020-01, Vol.5 (1), p.1</ispartof><rights>COPYRIGHT 2020 Oxford University Press</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c380t-d6d42369371083e052e31520586b70783376827329946cfabcc59453079ddc5c3</citedby><cites>FETCH-LOGICAL-c380t-d6d42369371083e052e31520586b70783376827329946cfabcc59453079ddc5c3</cites><orcidid>0000-0001-6947-7913 ; 0000-0001-9000-6622</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids></links><search><creatorcontrib>Robinson, Serina L</creatorcontrib><creatorcontrib>Smith, Megan D</creatorcontrib><creatorcontrib>Richman, Jack E</creatorcontrib><creatorcontrib>Aukema, Kelly G</creatorcontrib><creatorcontrib>Wackett, Lawrence P</creatorcontrib><title>Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily</title><title>Synthetic Biology</title><description>Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.</description><subject>Amino acids</subject><subject>Enzymes</subject><subject>Esters</subject><subject>Genomics</subject><subject>Machine learning</subject><subject>Membrane lipids</subject><subject>Personal care industry</subject><subject>Physiological aspects</subject><subject>Plant metabolites</subject><subject>Toiletries</subject><issn>2397-7000</issn><issn>2397-7000</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNpNkE9rwzAMxc3YYKXrdWd_gbROHMfJsZT9g45etnNwbLnVSOxgZ4Ps089lhQ0h9JB4D_Ej5D5n65w1fBNn16HfzFEpxsorsih4IzPJGLv-p2_JKsaPJHIpRM6LBZlelT6hA9qDCg7dMetUBEPHAAb1hN5Rb6lK6gunmSpnaPzs4hTUBDSOoNGiPl-sD_TQw5aC-54HiBQdnU6QGn2fIpNthGDVgP18R26s6iOsLnNJ3h8f3nbP2f7w9LLb7jPNazZlpjJlwauGy5zVHJgogOeiYKKuOslkzbms6kLyomnKSlvVaS2aUnAmG2O00HxJ1r-5R9VDi8769LdOZWBA7R1YTPttVdfJIkX5Z9DBxxjAtmPAQYW5zVl75tz-cm4vnPkPmpF0Tg</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Robinson, Serina L</creator><creator>Smith, Megan D</creator><creator>Richman, Jack E</creator><creator>Aukema, Kelly G</creator><creator>Wackett, Lawrence P</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>IAO</scope><orcidid>https://orcid.org/0000-0001-6947-7913</orcidid><orcidid>https://orcid.org/0000-0001-9000-6622</orcidid></search><sort><creationdate>20200101</creationdate><title>Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily</title><author>Robinson, Serina L ; Smith, Megan D ; Richman, Jack E ; Aukema, Kelly G ; Wackett, Lawrence P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-d6d42369371083e052e31520586b70783376827329946cfabcc59453079ddc5c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Amino acids</topic><topic>Enzymes</topic><topic>Esters</topic><topic>Genomics</topic><topic>Machine learning</topic><topic>Membrane lipids</topic><topic>Personal care industry</topic><topic>Physiological aspects</topic><topic>Plant metabolites</topic><topic>Toiletries</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Robinson, Serina L</creatorcontrib><creatorcontrib>Smith, Megan D</creatorcontrib><creatorcontrib>Richman, Jack E</creatorcontrib><creatorcontrib>Aukema, Kelly G</creatorcontrib><creatorcontrib>Wackett, Lawrence P</creatorcontrib><collection>CrossRef</collection><collection>Gale Academic OneFile</collection><jtitle>Synthetic Biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Robinson, Serina L</au><au>Smith, Megan D</au><au>Richman, Jack E</au><au>Aukema, Kelly G</au><au>Wackett, Lawrence P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily</atitle><jtitle>Synthetic Biology</jtitle><date>2020-01-01</date><risdate>2020</risdate><volume>5</volume><issue>1</issue><spage>1</spage><pages>1-</pages><issn>2397-7000</issn><eissn>2397-7000</eissn><abstract>Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.</abstract><pub>Oxford University Press</pub><doi>10.1093/synbio/ysaa004</doi><orcidid>https://orcid.org/0000-0001-6947-7913</orcidid><orcidid>https://orcid.org/0000-0001-9000-6622</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2397-7000
ispartof Synthetic Biology, 2020-01, Vol.5 (1), p.1
issn 2397-7000
2397-7000
language eng
recordid cdi_crossref_primary_10_1093_synbio_ysaa004
source Oxford Journals Open Access Collection; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Amino acids
Enzymes
Esters
Genomics
Machine learning
Membrane lipids
Personal care industry
Physiological aspects
Plant metabolites
Toiletries
title Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A46%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning-based%20prediction%20of%20activity%20and%20substrate%20specificity%20for%20OleA%20enzymes%20in%20the%20thiolase%20superfamily&rft.jtitle=Synthetic%20Biology&rft.au=Robinson,%20Serina%20L&rft.date=2020-01-01&rft.volume=5&rft.issue=1&rft.spage=1&rft.pages=1-&rft.issn=2397-7000&rft.eissn=2397-7000&rft_id=info:doi/10.1093/synbio/ysaa004&rft_dat=%3Cgale_cross%3EA688079754%3C/gale_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_galeid=A688079754&rfr_iscdi=true