ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies
TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free met...
Gespeichert in:
Veröffentlicht in: | PeerJ (San Francisco, CA) CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e11348 |
---|---|
container_issue | |
container_start_page | e11348 |
container_title | PeerJ (San Francisco, CA) |
container_volume | 9 |
creator | Leonard, Raphael R. Leleu, Marie Van Vlierberghe, Mick Cornet, Luc Kerff, Frederic Baurain, Denis |
description | TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd]. |
doi_str_mv | 10.7717/peerj.11348 |
format | Article |
fullrecord | <record><control><sourceid>gale_webof</sourceid><recordid>TN_cdi_gale_infotracacademiconefile_A660707019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A660707019</galeid><doaj_id>oai_doaj_org_article_b226476f4962420da9b2da04e36ed2cc</doaj_id><sourcerecordid>A660707019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</originalsourceid><addsrcrecordid>eNqNkkGP0zAQhSMEYlfLnrijSEgICVpiO3HiPSBVpcBKixBoOXCyHGfcukrjYjtF_Hsm6dJtEQfiQ0bON8_jl5ckT0k2LUtSvtkC-PWUEJZXD5JzSng5qVghHh7VZ8llCOsMn4ryrGKPkzPGhOC0Ks-T77fu65d-8Um9m12l0bk2Nc6nHqK3sLPdMv3RA5ZNuuhrpSPW6nW6gagaFVWquiZtwMO2tVrFAVchwKZuLYQnySOj2gCXd--L5Nv7xe384-Tm84fr-exmojkp46SpOcuMAaYBdFYKUXJTA3CiBBS54MrosjFZUTNCDOWiAMpNoQzJ84ppwtlFcr3XbZxay623G-V_SaesHDecX0rlo9UtyJpSnqM-ytKcZo0SNW1UlgPj0FCtUevtXmvb1xtoNHTRq_ZE9PRLZ1dy6XayIhlnIkcBthdAB5aAh9dW7ujYONZ9i9NoWYPEUSpJCyFEgV0v7471Dv0OUW5s0NC2qgPXB8RolbO8oMMBz_9C1673HRo8UJQyzom4p5YKr20743BaPYjKGedZiWukpv-gcDWwsdp1YCzunzS8OGpYgWrjKri2j9Z14RR8tQe1dyF4MAcLSSaH3Moxt3LMLdLPjl0_sH9SikC1B35C7UzQFjoNBwyDjX81Q3_GjM9tVMNAc9d38X6S_2llvwGUgAeS</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2522236619</pqid></control><display><type>article</type><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><source>DOAJ Directory of Open Access Journals</source><source>Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</creator><creatorcontrib>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</creatorcontrib><description>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</description><identifier>ISSN: 2167-8359</identifier><identifier>EISSN: 2167-8359</identifier><identifier>DOI: 10.7717/peerj.11348</identifier><identifier>PMID: 33996287</identifier><language>eng</language><publisher>LONDON: Peerj Inc</publisher><subject><![CDATA[Alignment-free methods ; Bacteria ; Biochemistry, biophysics & molecular biology ; Biochimie, biophysique & biologie moléculaire ; Bioinformatics ; Data compression ; Dereplication ; Dictionaries ; Genetics & genetic processes ; Genome quality ; Genome selection ; Genomes ; Genomics ; Génétique & processus génétiques ; Information theory ; Life sciences ; Metadata ; Metagenomics ; Microbiologie ; Microbiology ; Multidisciplinary Sciences ; NCBI RefSeq ; Phylogenomics ; Problem solving ; Prokaryotes ; Science & Technology ; Science & Technology - Other Topics ; Sciences du vivant ; Singularity ; Software ; Taxonomy ; Work stations]]></subject><ispartof>PeerJ (San Francisco, CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348</ispartof><rights>2021 Léonard et al.</rights><rights>COPYRIGHT 2021 PeerJ. Ltd.</rights><rights>2021 Léonard et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Léonard et al. 2021 Léonard et al.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>3</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000647034500008</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</citedby><cites>FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</cites><orcidid>0000-0003-2388-6185 ; 0000-0003-3098-8876</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8106394/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,865,886,2103,2115,27929,27930,39263,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33996287$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Leonard, Raphael R.</creatorcontrib><creatorcontrib>Leleu, Marie</creatorcontrib><creatorcontrib>Van Vlierberghe, Mick</creatorcontrib><creatorcontrib>Cornet, Luc</creatorcontrib><creatorcontrib>Kerff, Frederic</creatorcontrib><creatorcontrib>Baurain, Denis</creatorcontrib><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><title>PeerJ (San Francisco, CA)</title><addtitle>PEERJ</addtitle><addtitle>PeerJ</addtitle><description>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</description><subject>Alignment-free methods</subject><subject>Bacteria</subject><subject>Biochemistry, biophysics & molecular biology</subject><subject>Biochimie, biophysique & biologie moléculaire</subject><subject>Bioinformatics</subject><subject>Data compression</subject><subject>Dereplication</subject><subject>Dictionaries</subject><subject>Genetics & genetic processes</subject><subject>Genome quality</subject><subject>Genome selection</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Génétique & processus génétiques</subject><subject>Information theory</subject><subject>Life sciences</subject><subject>Metadata</subject><subject>Metagenomics</subject><subject>Microbiologie</subject><subject>Microbiology</subject><subject>Multidisciplinary Sciences</subject><subject>NCBI RefSeq</subject><subject>Phylogenomics</subject><subject>Problem solving</subject><subject>Prokaryotes</subject><subject>Science & Technology</subject><subject>Science & Technology - Other Topics</subject><subject>Sciences du vivant</subject><subject>Singularity</subject><subject>Software</subject><subject>Taxonomy</subject><subject>Work stations</subject><issn>2167-8359</issn><issn>2167-8359</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkkGP0zAQhSMEYlfLnrijSEgICVpiO3HiPSBVpcBKixBoOXCyHGfcukrjYjtF_Hsm6dJtEQfiQ0bON8_jl5ckT0k2LUtSvtkC-PWUEJZXD5JzSng5qVghHh7VZ8llCOsMn4ryrGKPkzPGhOC0Ks-T77fu65d-8Um9m12l0bk2Nc6nHqK3sLPdMv3RA5ZNuuhrpSPW6nW6gagaFVWquiZtwMO2tVrFAVchwKZuLYQnySOj2gCXd--L5Nv7xe384-Tm84fr-exmojkp46SpOcuMAaYBdFYKUXJTA3CiBBS54MrosjFZUTNCDOWiAMpNoQzJ84ppwtlFcr3XbZxay623G-V_SaesHDecX0rlo9UtyJpSnqM-ytKcZo0SNW1UlgPj0FCtUevtXmvb1xtoNHTRq_ZE9PRLZ1dy6XayIhlnIkcBthdAB5aAh9dW7ujYONZ9i9NoWYPEUSpJCyFEgV0v7471Dv0OUW5s0NC2qgPXB8RolbO8oMMBz_9C1673HRo8UJQyzom4p5YKr20743BaPYjKGedZiWukpv-gcDWwsdp1YCzunzS8OGpYgWrjKri2j9Z14RR8tQe1dyF4MAcLSSaH3Moxt3LMLdLPjl0_sH9SikC1B35C7UzQFjoNBwyDjX81Q3_GjM9tVMNAc9d38X6S_2llvwGUgAeS</recordid><startdate>20210505</startdate><enddate>20210505</enddate><creator>Leonard, Raphael R.</creator><creator>Leleu, Marie</creator><creator>Van Vlierberghe, Mick</creator><creator>Cornet, Luc</creator><creator>Kerff, Frederic</creator><creator>Baurain, Denis</creator><general>Peerj Inc</general><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ</general><general>PeerJ Inc</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>Q33</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2388-6185</orcidid><orcidid>https://orcid.org/0000-0003-3098-8876</orcidid></search><sort><creationdate>20210505</creationdate><title>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</title><author>Leonard, Raphael R. ; Leleu, Marie ; Van Vlierberghe, Mick ; Cornet, Luc ; Kerff, Frederic ; Baurain, Denis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c617t-db630ffe3ceec079976fbee61a9e5496afc7df05b311f2695e26f5af14483c163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Alignment-free methods</topic><topic>Bacteria</topic><topic>Biochemistry, biophysics & molecular biology</topic><topic>Biochimie, biophysique & biologie moléculaire</topic><topic>Bioinformatics</topic><topic>Data compression</topic><topic>Dereplication</topic><topic>Dictionaries</topic><topic>Genetics & genetic processes</topic><topic>Genome quality</topic><topic>Genome selection</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Génétique & processus génétiques</topic><topic>Information theory</topic><topic>Life sciences</topic><topic>Metadata</topic><topic>Metagenomics</topic><topic>Microbiologie</topic><topic>Microbiology</topic><topic>Multidisciplinary Sciences</topic><topic>NCBI RefSeq</topic><topic>Phylogenomics</topic><topic>Problem solving</topic><topic>Prokaryotes</topic><topic>Science & Technology</topic><topic>Science & Technology - Other Topics</topic><topic>Sciences du vivant</topic><topic>Singularity</topic><topic>Software</topic><topic>Taxonomy</topic><topic>Work stations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Leonard, Raphael R.</creatorcontrib><creatorcontrib>Leleu, Marie</creatorcontrib><creatorcontrib>Van Vlierberghe, Mick</creatorcontrib><creatorcontrib>Cornet, Luc</creatorcontrib><creatorcontrib>Kerff, Frederic</creatorcontrib><creatorcontrib>Baurain, Denis</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Université de Liège - Open Repository and Bibliography (ORBI)</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PeerJ (San Francisco, CA)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Leonard, Raphael R.</au><au>Leleu, Marie</au><au>Van Vlierberghe, Mick</au><au>Cornet, Luc</au><au>Kerff, Frederic</au><au>Baurain, Denis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies</atitle><jtitle>PeerJ (San Francisco, CA)</jtitle><stitle>PEERJ</stitle><addtitle>PeerJ</addtitle><date>2021-05-05</date><risdate>2021</risdate><volume>9</volume><spage>e11348</spage><epage>e11348</epage><pages>e11348-e11348</pages><artnum>11348</artnum><artnum>e11348</artnum><issn>2167-8359</issn><eissn>2167-8359</eissn><abstract>TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd].</abstract><cop>LONDON</cop><pub>Peerj Inc</pub><pmid>33996287</pmid><doi>10.7717/peerj.11348</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-2388-6185</orcidid><orcidid>https://orcid.org/0000-0003-3098-8876</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2167-8359 |
ispartof | PeerJ (San Francisco, CA), 2021-05, Vol.9, p.e11348-e11348, Article 11348 |
issn | 2167-8359 2167-8359 |
language | eng |
recordid | cdi_gale_infotracacademiconefile_A660707019 |
source | DOAJ Directory of Open Access Journals; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Alignment-free methods Bacteria Biochemistry, biophysics & molecular biology Biochimie, biophysique & biologie moléculaire Bioinformatics Data compression Dereplication Dictionaries Genetics & genetic processes Genome quality Genome selection Genomes Genomics Génétique & processus génétiques Information theory Life sciences Metadata Metagenomics Microbiologie Microbiology Multidisciplinary Sciences NCBI RefSeq Phylogenomics Problem solving Prokaryotes Science & Technology Science & Technology - Other Topics Sciences du vivant Singularity Software Taxonomy Work stations |
title | ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T21%3A16%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ToRQuEMaDA:%20tool%20for%20retrieving%20queried%20Eubacteria,%20metadata%20and%20dereplicating%20assemblies&rft.jtitle=PeerJ%20(San%20Francisco,%20CA)&rft.au=Leonard,%20Raphael%20R.&rft.date=2021-05-05&rft.volume=9&rft.spage=e11348&rft.epage=e11348&rft.pages=e11348-e11348&rft.artnum=11348&rft.issn=2167-8359&rft.eissn=2167-8359&rft_id=info:doi/10.7717/peerj.11348&rft_dat=%3Cgale_webof%3EA660707019%3C/gale_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2522236619&rft_id=info:pmid/33996287&rft_galeid=A660707019&rft_doaj_id=oai_doaj_org_article_b226476f4962420da9b2da04e36ed2cc&rfr_iscdi=true |