Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions

Abstract Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Res...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2018-09, Vol.34 (17), p.i757-i765
Hauptverfasser: Peng, Hui, Zheng, Yi, Zhao, Zhixun, Liu, Tao, Li, Jinyan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page i765
container_issue 17
container_start_page i757
container_title Bioinformatics
container_volume 34
creator Peng, Hui
Zheng, Yi
Zhao, Zhixun
Liu, Tao
Li, Jinyan
description Abstract Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results We developed an ensemble learning method to detect the off-target sites of a single guide RNA (sgRNA) from its thousands of genome-wide candidates. Nucleotide mismatches between on-target and off-target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5′-end close regions of the off-target sequences. Comparing with the on-target sites, sequences of no-editing sites can be also characterized by GC composition changes and position-specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision-Recall curve in cross-validations on big datasets, outperforming state-of-the-art methods in various test scenarios. Our predicted off-target sites also correspond very well to those detected by high-throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation The python and matlab version of source codes for detecting off-target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn-hui/OfftargetPredict. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/bty558
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2133437505</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/bty558</oup_id><sourcerecordid>2133437505</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-89cf3d4a6e46112ae66fea4ef1ab1fc446d472f0a8ec1bc78c736852591ed3ac3</originalsourceid><addsrcrecordid>eNqNkN1LwzAUxYMobk7_BKWPvtQlzUfbRxl-DARl6nNJ05st0iYzSYX993Z0Cr75dO-F3zn3cBC6JPiG4JLOa-OM1c53MhoV5nXccV4coSlhAqcZ5uXxsFORp6zAdILOQvjAmBPG2CmaUMwyigWfIr0C5dbWRONs4nSyWC1fX1bzhQzlcOo0Sr-GmAQTISRx412_3iRgA3R1C0kL0ltj13tlb-ELbNKZMERSm6QxIXpT93vncI5OtGwDXBzmDL3f370tHtOn54fl4vYpVbTMY1qUStOGSQFMEJJJEEKDZKCJrIlWjImG5ZnGsgBFapUXKqei4BkvCTRUKjpD16Pv1rvPHkKshjwK2lZacH2oMkIpoznHfED5iCrvQvCgq603nfS7iuBqX3H1t-JqrHjQXR1e9HUHza_qp9MBwCPg-u0_Pb8BVYqQng</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2133437505</pqid></control><display><type>article</type><title>Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions</title><source>Oxford Journals Open Access Collection</source><creator>Peng, Hui ; Zheng, Yi ; Zhao, Zhixun ; Liu, Tao ; Li, Jinyan</creator><creatorcontrib>Peng, Hui ; Zheng, Yi ; Zhao, Zhixun ; Liu, Tao ; Li, Jinyan</creatorcontrib><description>Abstract Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results We developed an ensemble learning method to detect the off-target sites of a single guide RNA (sgRNA) from its thousands of genome-wide candidates. Nucleotide mismatches between on-target and off-target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5′-end close regions of the off-target sequences. Comparing with the on-target sites, sequences of no-editing sites can be also characterized by GC composition changes and position-specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision-Recall curve in cross-validations on big datasets, outperforming state-of-the-art methods in various test scenarios. Our predicted off-target sites also correspond very well to those detected by high-throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation The python and matlab version of source codes for detecting off-target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn-hui/OfftargetPredict. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bty558</identifier><identifier>PMID: 30423065</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Base Composition ; CRISPR-Cas Systems ; Genome ; High-Throughput Nucleotide Sequencing - methods ; Machine Learning ; RNA, Guide, CRISPR-Cas Systems - genetics ; Software</subject><ispartof>Bioinformatics, 2018-09, Vol.34 (17), p.i757-i765</ispartof><rights>The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-89cf3d4a6e46112ae66fea4ef1ab1fc446d472f0a8ec1bc78c736852591ed3ac3</citedby><cites>FETCH-LOGICAL-c397t-89cf3d4a6e46112ae66fea4ef1ab1fc446d472f0a8ec1bc78c736852591ed3ac3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27903,27904</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/bty558$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30423065$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Peng, Hui</creatorcontrib><creatorcontrib>Zheng, Yi</creatorcontrib><creatorcontrib>Zhao, Zhixun</creatorcontrib><creatorcontrib>Liu, Tao</creatorcontrib><creatorcontrib>Li, Jinyan</creatorcontrib><title>Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results We developed an ensemble learning method to detect the off-target sites of a single guide RNA (sgRNA) from its thousands of genome-wide candidates. Nucleotide mismatches between on-target and off-target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5′-end close regions of the off-target sequences. Comparing with the on-target sites, sequences of no-editing sites can be also characterized by GC composition changes and position-specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision-Recall curve in cross-validations on big datasets, outperforming state-of-the-art methods in various test scenarios. Our predicted off-target sites also correspond very well to those detected by high-throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation The python and matlab version of source codes for detecting off-target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn-hui/OfftargetPredict. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Base Composition</subject><subject>CRISPR-Cas Systems</subject><subject>Genome</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Machine Learning</subject><subject>RNA, Guide, CRISPR-Cas Systems - genetics</subject><subject>Software</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkN1LwzAUxYMobk7_BKWPvtQlzUfbRxl-DARl6nNJ05st0iYzSYX993Z0Cr75dO-F3zn3cBC6JPiG4JLOa-OM1c53MhoV5nXccV4coSlhAqcZ5uXxsFORp6zAdILOQvjAmBPG2CmaUMwyigWfIr0C5dbWRONs4nSyWC1fX1bzhQzlcOo0Sr-GmAQTISRx412_3iRgA3R1C0kL0ltj13tlb-ELbNKZMERSm6QxIXpT93vncI5OtGwDXBzmDL3f370tHtOn54fl4vYpVbTMY1qUStOGSQFMEJJJEEKDZKCJrIlWjImG5ZnGsgBFapUXKqei4BkvCTRUKjpD16Pv1rvPHkKshjwK2lZacH2oMkIpoznHfED5iCrvQvCgq603nfS7iuBqX3H1t-JqrHjQXR1e9HUHza_qp9MBwCPg-u0_Pb8BVYqQng</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Peng, Hui</creator><creator>Zheng, Yi</creator><creator>Zhao, Zhixun</creator><creator>Liu, Tao</creator><creator>Li, Jinyan</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20180901</creationdate><title>Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions</title><author>Peng, Hui ; Zheng, Yi ; Zhao, Zhixun ; Liu, Tao ; Li, Jinyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-89cf3d4a6e46112ae66fea4ef1ab1fc446d472f0a8ec1bc78c736852591ed3ac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Base Composition</topic><topic>CRISPR-Cas Systems</topic><topic>Genome</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Machine Learning</topic><topic>RNA, Guide, CRISPR-Cas Systems - genetics</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Hui</creatorcontrib><creatorcontrib>Zheng, Yi</creatorcontrib><creatorcontrib>Zhao, Zhixun</creatorcontrib><creatorcontrib>Liu, Tao</creatorcontrib><creatorcontrib>Li, Jinyan</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Peng, Hui</au><au>Zheng, Yi</au><au>Zhao, Zhixun</au><au>Liu, Tao</au><au>Li, Jinyan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2018-09-01</date><risdate>2018</risdate><volume>34</volume><issue>17</issue><spage>i757</spage><epage>i765</epage><pages>i757-i765</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results We developed an ensemble learning method to detect the off-target sites of a single guide RNA (sgRNA) from its thousands of genome-wide candidates. Nucleotide mismatches between on-target and off-target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5′-end close regions of the off-target sequences. Comparing with the on-target sites, sequences of no-editing sites can be also characterized by GC composition changes and position-specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision-Recall curve in cross-validations on big datasets, outperforming state-of-the-art methods in various test scenarios. Our predicted off-target sites also correspond very well to those detected by high-throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation The python and matlab version of source codes for detecting off-target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn-hui/OfftargetPredict. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>30423065</pmid><doi>10.1093/bioinformatics/bty558</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2018-09, Vol.34 (17), p.i757-i765
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_2133437505
source Oxford Journals Open Access Collection
subjects Base Composition
CRISPR-Cas Systems
Genome
High-Throughput Nucleotide Sequencing - methods
Machine Learning
RNA, Guide, CRISPR-Cas Systems - genetics
Software
title Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T13%3A14%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recognition%20of%20CRISPR/Cas9%20off-target%20sites%20through%20ensemble%20learning%20of%20uneven%20mismatch%20distributions&rft.jtitle=Bioinformatics&rft.au=Peng,%20Hui&rft.date=2018-09-01&rft.volume=34&rft.issue=17&rft.spage=i757&rft.epage=i765&rft.pages=i757-i765&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/bty558&rft_dat=%3Cproquest_TOX%3E2133437505%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2133437505&rft_id=info:pmid/30423065&rft_oup_id=10.1093/bioinformatics/bty558&rfr_iscdi=true