ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads
Abstract Summary Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast l...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2022-06, Vol.38 (12), p.3291-3293 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3293 |
---|---|
container_issue | 12 |
container_start_page | 3291 |
container_title | Bioinformatics |
container_volume | 38 |
creator | Hunt, Martin Swann, Jeremy Constantinides, Bede Fowler, Philip W Iqbal, Zamin |
description | Abstract
Summary
Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others.
Availability and implementation
ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep.
Supplementary information
Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btac311 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9191204</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btac311</oup_id><sourcerecordid>2664802351</sourcerecordid><originalsourceid>FETCH-LOGICAL-c386t-5aadf07e0780aef27b07b9ca80ff391a426af4cbd679a9e8a9c280d82ee626843</originalsourceid><addsrcrecordid>eNqNkUtLJDEUhYMoo7bzF6SWbmo6j6pUxYXQNI6KguBrG26lbjTSlZRJ9YD_fiLdI-POVQL3nC835xByzOgvRpWYdy44b0McYHImzbsJjGBshxywStKS01rt5ruQTVm1VOyTw5ReKa1ZVVU_yL6o6zoP6wOyvEPor6aF768Rx9Miwuj6okcT_ASD8xkffBFscb-4uy-X4ankRcK3NXrj_HMRszsdkT0Lq4Q_t-eMPP4-f1helje3F1fLxU1pRCunsgboLW2QNi0FtLzpaNMpAy21VigGFZdgK9P1slGgsAVleEv7liNKLttKzMjZhjuuuwF7g36KsNJjdAPEdx3A6a8T7170c_ijFVOM0w_AyRYQQ_5CmvTgksHVCjyGddJcypwWFzmbGZEbqYkhpYj28xlG9UcD-msDettANh7_v-Sn7V_kWcA2grAevwv9C9KdmmU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2664802351</pqid></control><display><type>article</type><title>ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Hunt, Martin ; Swann, Jeremy ; Constantinides, Bede ; Fowler, Philip W ; Iqbal, Zamin</creator><contributor>Alkan, Can</contributor><creatorcontrib>Hunt, Martin ; Swann, Jeremy ; Constantinides, Bede ; Fowler, Philip W ; Iqbal, Zamin ; Alkan, Can</creatorcontrib><description>Abstract
Summary
Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others.
Availability and implementation
ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btac311</identifier><identifier>PMID: 35551365</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Applications Notes ; COVID-19 ; Decontamination ; Genome, Human ; High-Throughput Nucleotide Sequencing ; Humans ; SARS-CoV-2 - genetics ; Sequence Analysis, DNA ; Software</subject><ispartof>Bioinformatics, 2022-06, Vol.38 (12), p.3291-3293</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c386t-5aadf07e0780aef27b07b9ca80ff391a426af4cbd679a9e8a9c280d82ee626843</citedby><cites>FETCH-LOGICAL-c386t-5aadf07e0780aef27b07b9ca80ff391a426af4cbd679a9e8a9c280d82ee626843</cites><orcidid>0000-0001-8466-7547</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191204/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191204/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35551365$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Alkan, Can</contributor><creatorcontrib>Hunt, Martin</creatorcontrib><creatorcontrib>Swann, Jeremy</creatorcontrib><creatorcontrib>Constantinides, Bede</creatorcontrib><creatorcontrib>Fowler, Philip W</creatorcontrib><creatorcontrib>Iqbal, Zamin</creatorcontrib><title>ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract
Summary
Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others.
Availability and implementation
ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><subject>Applications Notes</subject><subject>COVID-19</subject><subject>Decontamination</subject><subject>Genome, Human</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>SARS-CoV-2 - genetics</subject><subject>Sequence Analysis, DNA</subject><subject>Software</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkUtLJDEUhYMoo7bzF6SWbmo6j6pUxYXQNI6KguBrG26lbjTSlZRJ9YD_fiLdI-POVQL3nC835xByzOgvRpWYdy44b0McYHImzbsJjGBshxywStKS01rt5ruQTVm1VOyTw5ReKa1ZVVU_yL6o6zoP6wOyvEPor6aF768Rx9Miwuj6okcT_ASD8xkffBFscb-4uy-X4ankRcK3NXrj_HMRszsdkT0Lq4Q_t-eMPP4-f1helje3F1fLxU1pRCunsgboLW2QNi0FtLzpaNMpAy21VigGFZdgK9P1slGgsAVleEv7liNKLttKzMjZhjuuuwF7g36KsNJjdAPEdx3A6a8T7170c_ijFVOM0w_AyRYQQ_5CmvTgksHVCjyGddJcypwWFzmbGZEbqYkhpYj28xlG9UcD-msDettANh7_v-Sn7V_kWcA2grAevwv9C9KdmmU</recordid><startdate>20220613</startdate><enddate>20220613</enddate><creator>Hunt, Martin</creator><creator>Swann, Jeremy</creator><creator>Constantinides, Bede</creator><creator>Fowler, Philip W</creator><creator>Iqbal, Zamin</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8466-7547</orcidid></search><sort><creationdate>20220613</creationdate><title>ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads</title><author>Hunt, Martin ; Swann, Jeremy ; Constantinides, Bede ; Fowler, Philip W ; Iqbal, Zamin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c386t-5aadf07e0780aef27b07b9ca80ff391a426af4cbd679a9e8a9c280d82ee626843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Applications Notes</topic><topic>COVID-19</topic><topic>Decontamination</topic><topic>Genome, Human</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>SARS-CoV-2 - genetics</topic><topic>Sequence Analysis, DNA</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hunt, Martin</creatorcontrib><creatorcontrib>Swann, Jeremy</creatorcontrib><creatorcontrib>Constantinides, Bede</creatorcontrib><creatorcontrib>Fowler, Philip W</creatorcontrib><creatorcontrib>Iqbal, Zamin</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hunt, Martin</au><au>Swann, Jeremy</au><au>Constantinides, Bede</au><au>Fowler, Philip W</au><au>Iqbal, Zamin</au><au>Alkan, Can</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2022-06-13</date><risdate>2022</risdate><volume>38</volume><issue>12</issue><spage>3291</spage><epage>3293</epage><pages>3291-3293</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract
Summary
Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others.
Availability and implementation
ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep.
Supplementary information
Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35551365</pmid><doi>10.1093/bioinformatics/btac311</doi><tpages>3</tpages><orcidid>https://orcid.org/0000-0001-8466-7547</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2022-06, Vol.38 (12), p.3291-3293 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9191204 |
source | MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection |
subjects | Applications Notes COVID-19 Decontamination Genome, Human High-Throughput Nucleotide Sequencing Humans SARS-CoV-2 - genetics Sequence Analysis, DNA Software |
title | ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T18%3A07%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ReadItAndKeep:%20rapid%20decontamination%20of%20SARS-CoV-2%20sequencing%20reads&rft.jtitle=Bioinformatics&rft.au=Hunt,%20Martin&rft.date=2022-06-13&rft.volume=38&rft.issue=12&rft.spage=3291&rft.epage=3293&rft.pages=3291-3293&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btac311&rft_dat=%3Cproquest_pubme%3E2664802351%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2664802351&rft_id=info:pmid/35551365&rft_oup_id=10.1093/bioinformatics/btac311&rfr_iscdi=true |