Modular, efficient and constant-memory single-cell RNA-seq preprocessing

We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The wo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature biotechnology 2021-07, Vol.39 (7), p.813-818
Hauptverfasser: Melsted, Páll, Booeshaghi, A. Sina, Liu, Lauren, Gao, Fan, Lu, Lambda, Min, Kyung Hoi (Joseph), da Veiga Beltrame, Eduardo, Hjörleifsson, Kristján Eldjárn, Gehring, Jase, Pachter, Lior
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 818
container_issue 7
container_start_page 813
container_title Nature biotechnology
container_volume 39
creator Melsted, Páll
Booeshaghi, A. Sina
Liu, Lauren
Gao, Fan
Lu, Lambda
Min, Kyung Hoi (Joseph)
da Veiga Beltrame, Eduardo
Hjörleifsson, Kristján Eldjárn
Gehring, Jase
Pachter, Lior
description We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. A preprocessing workflow for single-cell RNA-seq data achieves near-optimal speed.
doi_str_mv 10.1038/s41587-021-00870-2
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_2508576726</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A668270166</galeid><sourcerecordid>A668270166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c622t-6f2545c266fba2da8aa9978558a789ff6e88e271b0f387fa1f79a3fafd6698fd3</originalsourceid><addsrcrecordid>eNqNkltrFTEUhYMotlb_gA8yIEgLTU0yJzuZx0OptlAt1MtryJnZGVPmcprMgP33ZjqtdURF8pCQ_a2VTfYi5CVnR5zl-m1ccakVZYJTxrRiVDwiu1yugHIo4HE6s6nMJeyQZzFeMcZgBfCU7OS5KqTWepecfuirsbHhMEPnfOmxGzLbVVnZd3Gw3UBbbPtwk0Xf1Q3SEpsmu_y4phGvs23AbehLjFPxOXnibBPxxd2-R768O_l8fErPL96fHa_PaQlCDBSckCtZCgC3saKy2tqiUFpKbZUunAPUGoXiG-ZyrZzlThU2d9ZVAIV2Vb5H9mff9PT1iHEwrY9TW7bDfoxGSKalAiUgoa9_Q6_6MXSpu0RJJkEwUA9UbRs0vnP9EGw5mZo1gBaKcZi8jv5ApVVh69NnofPpfiF4sxAkZsDvQ23HGM0SPPg7ePbp8v_Zi69L9vAXdjOmId1OKvr62xBnyQIXM16GPsaAzmyDb224MZyZKW5mjptJcTO3cTMiiV7d_fC4abH6KbnPVwLyGYip1NUYHkbwD9sfnJLa2w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2550562067</pqid></control><display><type>article</type><title>Modular, efficient and constant-memory single-cell RNA-seq preprocessing</title><source>MEDLINE</source><source>Nature Journals Online</source><source>Alma/SFX Local Collection</source><creator>Melsted, Páll ; Booeshaghi, A. Sina ; Liu, Lauren ; Gao, Fan ; Lu, Lambda ; Min, Kyung Hoi (Joseph) ; da Veiga Beltrame, Eduardo ; Hjörleifsson, Kristján Eldjárn ; Gehring, Jase ; Pachter, Lior</creator><creatorcontrib>Melsted, Páll ; Booeshaghi, A. Sina ; Liu, Lauren ; Gao, Fan ; Lu, Lambda ; Min, Kyung Hoi (Joseph) ; da Veiga Beltrame, Eduardo ; Hjörleifsson, Kristján Eldjárn ; Gehring, Jase ; Pachter, Lior</creatorcontrib><description>We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. A preprocessing workflow for single-cell RNA-seq data achieves near-optimal speed.</description><identifier>ISSN: 1087-0156</identifier><identifier>ISSN: 1546-1696</identifier><identifier>EISSN: 1546-1696</identifier><identifier>DOI: 10.1038/s41587-021-00870-2</identifier><identifier>PMID: 33795888</identifier><language>eng</language><publisher>New York: Nature Publishing Group US</publisher><subject>631/114/2785 ; 631/114/794 ; 631/61/212/2019 ; Agriculture ; Analysis ; Base Sequence ; Bioinformatics ; Biomedical and Life Sciences ; Biomedical Engineering/Biotechnology ; Biomedicine ; Biotechnology ; Computer engineering ; Computer science ; Datasets ; Efficiency ; Experiments ; Gene sequencing ; Genes ; Genetic engineering ; Genetic markers ; Genomics ; High-Throughput Nucleotide Sequencing ; Humans ; Identification and classification ; Letter ; Life Sciences ; Mechanical engineering ; Methods ; Preprocessing ; Ribonucleic acid ; RNA ; RNA sequencing ; Sequence Analysis, RNA ; Single-Cell Analysis ; Software ; Workflow</subject><ispartof>Nature biotechnology, 2021-07, Vol.39 (7), p.813-818</ispartof><rights>The Author(s), under exclusive licence to Springer Nature America, Inc. 2021. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>2021. The Author(s), under exclusive licence to Springer Nature America, Inc.</rights><rights>COPYRIGHT 2021 Nature Publishing Group</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c622t-6f2545c266fba2da8aa9978558a789ff6e88e271b0f387fa1f79a3fafd6698fd3</citedby><cites>FETCH-LOGICAL-c622t-6f2545c266fba2da8aa9978558a789ff6e88e271b0f387fa1f79a3fafd6698fd3</cites><orcidid>0000-0003-0894-4017 ; 0000-0002-9164-6231</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33795888$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Melsted, Páll</creatorcontrib><creatorcontrib>Booeshaghi, A. Sina</creatorcontrib><creatorcontrib>Liu, Lauren</creatorcontrib><creatorcontrib>Gao, Fan</creatorcontrib><creatorcontrib>Lu, Lambda</creatorcontrib><creatorcontrib>Min, Kyung Hoi (Joseph)</creatorcontrib><creatorcontrib>da Veiga Beltrame, Eduardo</creatorcontrib><creatorcontrib>Hjörleifsson, Kristján Eldjárn</creatorcontrib><creatorcontrib>Gehring, Jase</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><title>Modular, efficient and constant-memory single-cell RNA-seq preprocessing</title><title>Nature biotechnology</title><addtitle>Nat Biotechnol</addtitle><addtitle>Nat Biotechnol</addtitle><description>We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. A preprocessing workflow for single-cell RNA-seq data achieves near-optimal speed.</description><subject>631/114/2785</subject><subject>631/114/794</subject><subject>631/61/212/2019</subject><subject>Agriculture</subject><subject>Analysis</subject><subject>Base Sequence</subject><subject>Bioinformatics</subject><subject>Biomedical and Life Sciences</subject><subject>Biomedical Engineering/Biotechnology</subject><subject>Biomedicine</subject><subject>Biotechnology</subject><subject>Computer engineering</subject><subject>Computer science</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Experiments</subject><subject>Gene sequencing</subject><subject>Genes</subject><subject>Genetic engineering</subject><subject>Genetic markers</subject><subject>Genomics</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Identification and classification</subject><subject>Letter</subject><subject>Life Sciences</subject><subject>Mechanical engineering</subject><subject>Methods</subject><subject>Preprocessing</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA sequencing</subject><subject>Sequence Analysis, RNA</subject><subject>Single-Cell Analysis</subject><subject>Software</subject><subject>Workflow</subject><issn>1087-0156</issn><issn>1546-1696</issn><issn>1546-1696</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>N95</sourceid><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqNkltrFTEUhYMotlb_gA8yIEgLTU0yJzuZx0OptlAt1MtryJnZGVPmcprMgP33ZjqtdURF8pCQ_a2VTfYi5CVnR5zl-m1ccakVZYJTxrRiVDwiu1yugHIo4HE6s6nMJeyQZzFeMcZgBfCU7OS5KqTWepecfuirsbHhMEPnfOmxGzLbVVnZd3Gw3UBbbPtwk0Xf1Q3SEpsmu_y4phGvs23AbehLjFPxOXnibBPxxd2-R768O_l8fErPL96fHa_PaQlCDBSckCtZCgC3saKy2tqiUFpKbZUunAPUGoXiG-ZyrZzlThU2d9ZVAIV2Vb5H9mff9PT1iHEwrY9TW7bDfoxGSKalAiUgoa9_Q6_6MXSpu0RJJkEwUA9UbRs0vnP9EGw5mZo1gBaKcZi8jv5ApVVh69NnofPpfiF4sxAkZsDvQ23HGM0SPPg7ePbp8v_Zi69L9vAXdjOmId1OKvr62xBnyQIXM16GPsaAzmyDb224MZyZKW5mjptJcTO3cTMiiV7d_fC4abH6KbnPVwLyGYip1NUYHkbwD9sfnJLa2w</recordid><startdate>20210701</startdate><enddate>20210701</enddate><creator>Melsted, Páll</creator><creator>Booeshaghi, A. Sina</creator><creator>Liu, Lauren</creator><creator>Gao, Fan</creator><creator>Lu, Lambda</creator><creator>Min, Kyung Hoi (Joseph)</creator><creator>da Veiga Beltrame, Eduardo</creator><creator>Hjörleifsson, Kristján Eldjárn</creator><creator>Gehring, Jase</creator><creator>Pachter, Lior</creator><general>Nature Publishing Group US</general><general>Nature Publishing Group</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>XI7</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7T7</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M2P</scope><scope>M7P</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0894-4017</orcidid><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid></search><sort><creationdate>20210701</creationdate><title>Modular, efficient and constant-memory single-cell RNA-seq preprocessing</title><author>Melsted, Páll ; Booeshaghi, A. Sina ; Liu, Lauren ; Gao, Fan ; Lu, Lambda ; Min, Kyung Hoi (Joseph) ; da Veiga Beltrame, Eduardo ; Hjörleifsson, Kristján Eldjárn ; Gehring, Jase ; Pachter, Lior</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c622t-6f2545c266fba2da8aa9978558a789ff6e88e271b0f387fa1f79a3fafd6698fd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>631/114/2785</topic><topic>631/114/794</topic><topic>631/61/212/2019</topic><topic>Agriculture</topic><topic>Analysis</topic><topic>Base Sequence</topic><topic>Bioinformatics</topic><topic>Biomedical and Life Sciences</topic><topic>Biomedical Engineering/Biotechnology</topic><topic>Biomedicine</topic><topic>Biotechnology</topic><topic>Computer engineering</topic><topic>Computer science</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Experiments</topic><topic>Gene sequencing</topic><topic>Genes</topic><topic>Genetic engineering</topic><topic>Genetic markers</topic><topic>Genomics</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Identification and classification</topic><topic>Letter</topic><topic>Life Sciences</topic><topic>Mechanical engineering</topic><topic>Methods</topic><topic>Preprocessing</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA sequencing</topic><topic>Sequence Analysis, RNA</topic><topic>Single-Cell Analysis</topic><topic>Software</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Melsted, Páll</creatorcontrib><creatorcontrib>Booeshaghi, A. Sina</creatorcontrib><creatorcontrib>Liu, Lauren</creatorcontrib><creatorcontrib>Gao, Fan</creatorcontrib><creatorcontrib>Lu, Lambda</creatorcontrib><creatorcontrib>Min, Kyung Hoi (Joseph)</creatorcontrib><creatorcontrib>da Veiga Beltrame, Eduardo</creatorcontrib><creatorcontrib>Hjörleifsson, Kristján Eldjárn</creatorcontrib><creatorcontrib>Gehring, Jase</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>Business Insights: Essentials</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Nature biotechnology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Melsted, Páll</au><au>Booeshaghi, A. Sina</au><au>Liu, Lauren</au><au>Gao, Fan</au><au>Lu, Lambda</au><au>Min, Kyung Hoi (Joseph)</au><au>da Veiga Beltrame, Eduardo</au><au>Hjörleifsson, Kristján Eldjárn</au><au>Gehring, Jase</au><au>Pachter, Lior</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modular, efficient and constant-memory single-cell RNA-seq preprocessing</atitle><jtitle>Nature biotechnology</jtitle><stitle>Nat Biotechnol</stitle><addtitle>Nat Biotechnol</addtitle><date>2021-07-01</date><risdate>2021</risdate><volume>39</volume><issue>7</issue><spage>813</spage><epage>818</epage><pages>813-818</pages><issn>1087-0156</issn><issn>1546-1696</issn><eissn>1546-1696</eissn><abstract>We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. A preprocessing workflow for single-cell RNA-seq data achieves near-optimal speed.</abstract><cop>New York</cop><pub>Nature Publishing Group US</pub><pmid>33795888</pmid><doi>10.1038/s41587-021-00870-2</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-0894-4017</orcidid><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1087-0156
ispartof Nature biotechnology, 2021-07, Vol.39 (7), p.813-818
issn 1087-0156
1546-1696
1546-1696
language eng
recordid cdi_proquest_miscellaneous_2508576726
source MEDLINE; Nature Journals Online; Alma/SFX Local Collection
subjects 631/114/2785
631/114/794
631/61/212/2019
Agriculture
Analysis
Base Sequence
Bioinformatics
Biomedical and Life Sciences
Biomedical Engineering/Biotechnology
Biomedicine
Biotechnology
Computer engineering
Computer science
Datasets
Efficiency
Experiments
Gene sequencing
Genes
Genetic engineering
Genetic markers
Genomics
High-Throughput Nucleotide Sequencing
Humans
Identification and classification
Letter
Life Sciences
Mechanical engineering
Methods
Preprocessing
Ribonucleic acid
RNA
RNA sequencing
Sequence Analysis, RNA
Single-Cell Analysis
Software
Workflow
title Modular, efficient and constant-memory single-cell RNA-seq preprocessing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T09%3A15%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modular,%20efficient%20and%20constant-memory%20single-cell%20RNA-seq%20preprocessing&rft.jtitle=Nature%20biotechnology&rft.au=Melsted,%20P%C3%A1ll&rft.date=2021-07-01&rft.volume=39&rft.issue=7&rft.spage=813&rft.epage=818&rft.pages=813-818&rft.issn=1087-0156&rft.eissn=1546-1696&rft_id=info:doi/10.1038/s41587-021-00870-2&rft_dat=%3Cgale_proqu%3EA668270166%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2550562067&rft_id=info:pmid/33795888&rft_galeid=A668270166&rfr_iscdi=true