ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies

TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free met...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PeerJ (San Francisco, CA) CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348
Hauptverfasser: Leonard, Raphael R., Leleu, Marie, Van Vlierberghe, Mick, Cornet, Luc, Kerff, Frederic, Baurain, Denis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e11348
container_issue
container_start_page e11348
container_title PeerJ (San Francisco, CA)
container_volume 9
creator Leonard, Raphael R.
Leleu, Marie
Van Vlierberghe, Mick
Cornet, Luc
Kerff, Frederic
Baurain, Denis
description TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].
doi_str_mv 10.7717/peerj.11348
format Article
fullrecord <record><control><sourceid>gale_webof</sourceid><recordid>TN_cdi_gale_infotracacademiconefile_A660707019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A660707019</galeid><doaj_id>oai_doaj_org_article_b226476f4962420da9b2da04e36ed2cc</doaj_id><sourcerecordid>A660707019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</originalsourceid><addsrcrecordid>eNqNkkGP0zAQhSMEYlfLnrijSEgICVpiO3HiPSBVpcBKixBoOXCyHGfcukrjYjtF_Hsm6dJtEQfiQ0bON8_jl5ckT0k2LUtSvtkC-PWUEJZXD5JzSng5qVghHh7VZ8llCOsMn4ryrGKPkzPGhOC0Ks-T77fu65d-8Um9m12l0bk2Nc6nHqK3sLPdMv3RA5ZNuuhrpSPW6nW6gagaFVWquiZtwMO2tVrFAVchwKZuLYQnySOj2gCXd--L5Nv7xe384-Tm84fr-exmojkp46SpOcuMAaYBdFYKUXJTA3CiBBS54MrosjFZUTNCDOWiAMpNoQzJ84ppwtlFcr3XbZxay623G-V_SaesHDecX0rlo9UtyJpSnqM-ytKcZo0SNW1UlgPj0FCtUevtXmvb1xtoNHTRq_ZE9PRLZ1dy6XayIhlnIkcBthdAB5aAh9dW7ujYONZ9i9NoWYPEUSpJCyFEgV0v7471Dv0OUW5s0NC2qgPXB8RolbO8oMMBz_9C1673HRo8UJQyzom4p5YKr20743BaPYjKGedZiWukpv-gcDWwsdp1YCzunzS8OGpYgWrjKri2j9Z14RR8tQe1dyF4MAcLSSaH3Moxt3LMLdLPjl0_sH9SikC1B35C7UzQFjoNBwyDjX81Q3_GjM9tVMNAc9d38X6S_2llvwGUgAeS</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2522236619</pqid></control><display><type>article</type><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><source>DOAJ Directory of Open Access Journals</source><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</creator><creatorcontrib>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</creatorcontrib><description>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</description><identifier>ISSN: 2167-8359</identifier><identifier>EISSN: 2167-8359</identifier><identifier>DOI: 10.7717/peerj.11348</identifier><identifier>PMID: 33996287</identifier><language>eng</language><publisher>LONDON: Peerj Inc</publisher><subject><![CDATA[Alignment-free methods ; Bacteria ; Biochemistry, biophysics & molecular biology ; Biochimie, biophysique & biologie moléculaire ; Bioinformatics ; Data compression ; Dereplication ; Dictionaries ; Genetics & genetic processes ; Genome quality ; Genome selection ; Genomes ; Genomics ; Génétique & processus génétiques ; Information theory ; Life sciences ; Metadata ; Metagenomics ; Microbiologie ; Microbiology ; Multidisciplinary Sciences ; NCBI RefSeq ; Phylogenomics ; Problem solving ; Prokaryotes ; Science & Technology ; Science & Technology - Other Topics ; Sciences du vivant ; Singularity ; Software ; Taxonomy ; Work stations]]></subject><ispartof>PeerJ (San Francisco, CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348</ispartof><rights>2021 Léonard et al.</rights><rights>COPYRIGHT 2021 PeerJ. Ltd.</rights><rights>2021 Léonard et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Léonard et al. 2021 Léonard et al.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>3</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000647034500008</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</citedby><cites>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</cites><orcidid>0000-0003-2388-6185 ; 0000-0003-3098-8876</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,865,886,2103,2115,27929,27930,39263,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33996287$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Leonard, Raphael R.</creatorcontrib><creatorcontrib>Leleu, Marie</creatorcontrib><creatorcontrib>Van Vlierberghe, Mick</creatorcontrib><creatorcontrib>Cornet, Luc</creatorcontrib><creatorcontrib>Kerff, Frederic</creatorcontrib><creatorcontrib>Baurain, Denis</creatorcontrib><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><title>PeerJ (San Francisco, CA)</title><addtitle>PEERJ</addtitle><addtitle>PeerJ</addtitle><description>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</description><subject>Alignment-free methods</subject><subject>Bacteria</subject><subject>Biochemistry, biophysics &amp; molecular biology</subject><subject>Biochimie, biophysique &amp; biologie moléculaire</subject><subject>Bioinformatics</subject><subject>Data compression</subject><subject>Dereplication</subject><subject>Dictionaries</subject><subject>Genetics &amp; genetic processes</subject><subject>Genome quality</subject><subject>Genome selection</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Génétique &amp; processus génétiques</subject><subject>Information theory</subject><subject>Life sciences</subject><subject>Metadata</subject><subject>Metagenomics</subject><subject>Microbiologie</subject><subject>Microbiology</subject><subject>Multidisciplinary Sciences</subject><subject>NCBI RefSeq</subject><subject>Phylogenomics</subject><subject>Problem solving</subject><subject>Prokaryotes</subject><subject>Science &amp; Technology</subject><subject>Science &amp; Technology - Other Topics</subject><subject>Sciences du vivant</subject><subject>Singularity</subject><subject>Software</subject><subject>Taxonomy</subject><subject>Work stations</subject><issn>2167-8359</issn><issn>2167-8359</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkkGP0zAQhSMEYlfLnrijSEgICVpiO3HiPSBVpcBKixBoOXCyHGfcukrjYjtF_Hsm6dJtEQfiQ0bON8_jl5ckT0k2LUtSvtkC-PWUEJZXD5JzSng5qVghHh7VZ8llCOsMn4ryrGKPkzPGhOC0Ks-T77fu65d-8Um9m12l0bk2Nc6nHqK3sLPdMv3RA5ZNuuhrpSPW6nW6gagaFVWquiZtwMO2tVrFAVchwKZuLYQnySOj2gCXd--L5Nv7xe384-Tm84fr-exmojkp46SpOcuMAaYBdFYKUXJTA3CiBBS54MrosjFZUTNCDOWiAMpNoQzJ84ppwtlFcr3XbZxay623G-V_SaesHDecX0rlo9UtyJpSnqM-ytKcZo0SNW1UlgPj0FCtUevtXmvb1xtoNHTRq_ZE9PRLZ1dy6XayIhlnIkcBthdAB5aAh9dW7ujYONZ9i9NoWYPEUSpJCyFEgV0v7471Dv0OUW5s0NC2qgPXB8RolbO8oMMBz_9C1673HRo8UJQyzom4p5YKr20743BaPYjKGedZiWukpv-gcDWwsdp1YCzunzS8OGpYgWrjKri2j9Z14RR8tQe1dyF4MAcLSSaH3Moxt3LMLdLPjl0_sH9SikC1B35C7UzQFjoNBwyDjX81Q3_GjM9tVMNAc9d38X6S_2llvwGUgAeS</recordid><startdate>20210505</startdate><enddate>20210505</enddate><creator>Leonard, Raphael R.</creator><creator>Leleu, Marie</creator><creator>Van Vlierberghe, Mick</creator><creator>Cornet, Luc</creator><creator>Kerff, Frederic</creator><creator>Baurain, Denis</creator><general>Peerj Inc</general><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ</general><general>PeerJ Inc</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>Q33</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2388-6185</orcidid><orcidid>https://orcid.org/0000-0003-3098-8876</orcidid></search><sort><creationdate>20210505</creationdate><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><author>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Alignment-free methods</topic><topic>Bacteria</topic><topic>Biochemistry, biophysics &amp; molecular biology</topic><topic>Biochimie, biophysique &amp; biologie moléculaire</topic><topic>Bioinformatics</topic><topic>Data compression</topic><topic>Dereplication</topic><topic>Dictionaries</topic><topic>Genetics &amp; genetic processes</topic><topic>Genome quality</topic><topic>Genome selection</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Génétique &amp; processus génétiques</topic><topic>Information theory</topic><topic>Life sciences</topic><topic>Metadata</topic><topic>Metagenomics</topic><topic>Microbiologie</topic><topic>Microbiology</topic><topic>Multidisciplinary Sciences</topic><topic>NCBI RefSeq</topic><topic>Phylogenomics</topic><topic>Problem solving</topic><topic>Prokaryotes</topic><topic>Science &amp; Technology</topic><topic>Science &amp; Technology - Other Topics</topic><topic>Sciences du vivant</topic><topic>Singularity</topic><topic>Software</topic><topic>Taxonomy</topic><topic>Work stations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Leonard, Raphael R.</creatorcontrib><creatorcontrib>Leleu, Marie</creatorcontrib><creatorcontrib>Van Vlierberghe, Mick</creatorcontrib><creatorcontrib>Cornet, Luc</creatorcontrib><creatorcontrib>Kerff, Frederic</creatorcontrib><creatorcontrib>Baurain, Denis</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Université de Liège - Open Repository and Bibliography (ORBI)</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PeerJ (San Francisco, CA)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Leonard, Raphael R.</au><au>Leleu, Marie</au><au>Van Vlierberghe, Mick</au><au>Cornet, Luc</au><au>Kerff, Frederic</au><au>Baurain, Denis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</atitle><jtitle>PeerJ (San Francisco, CA)</jtitle><stitle>PEERJ</stitle><addtitle>PeerJ</addtitle><date>2021-05-05</date><risdate>2021</risdate><volume>9</volume><spage>e11348</spage><epage>e11348</epage><pages>e11348-e11348</pages><artnum>11348</artnum><artnum>e11348</artnum><issn>2167-8359</issn><eissn>2167-8359</eissn><abstract>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</abstract><cop>LONDON</cop><pub>Peerj Inc</pub><pmid>33996287</pmid><doi>10.7717/peerj.11348</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-2388-6185</orcidid><orcidid>https://orcid.org/0000-0003-3098-8876</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2167-8359
ispartof PeerJ (San Francisco, CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348
issn 2167-8359
2167-8359
language eng
recordid cdi_gale_infotracacademiconefile_A660707019
source DOAJ Directory of Open Access Journals; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Alignment-free methods
Bacteria
Biochemistry, biophysics & molecular biology
Biochimie, biophysique & biologie moléculaire
Bioinformatics
Data compression
Dereplication
Dictionaries
Genetics & genetic processes
Genome quality
Genome selection
Genomes
Genomics
Génétique & processus génétiques
Information theory
Life sciences
Metadata
Metagenomics
Microbiologie
Microbiology
Multidisciplinary Sciences
NCBI RefSeq
Phylogenomics
Problem solving
Prokaryotes
Science & Technology
Science & Technology - Other Topics
Sciences du vivant
Singularity
Software
Taxonomy
Work stations
title ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T21%3A16%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ToRQuEMaDA:%20tool%20for%20retrieving%20queried%20Eubacteria,%20metadata%20and%20dereplicating%20assemblies&rft.jtitle=PeerJ%20(San%20Francisco,%20CA)&rft.au=Leonard,%20Raphael%20R.&rft.date=2021-05-05&rft.volume=9&rft.spage=e11348&rft.epage=e11348&rft.pages=e11348-e11348&rft.artnum=11348&rft.issn=2167-8359&rft.eissn=2167-8359&rft_id=info:doi/10.7717/peerj.11348&rft_dat=%3Cgale_webof%3EA660707019%3C/gale_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2522236619&rft_id=info:pmid/33996287&rft_galeid=A660707019&rft_doaj_id=oai_doaj_org_article_b226476f4962420da9b2da04e36ed2cc&rfr_iscdi=true