Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera

Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that like...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of heredity 2023-04, Vol.114 (2), p.120-130
Hauptverfasser: de Flamingh, Alida, Rivera-Colón, Angel G, Gnoske, Tom P, Kerbis Peterhans, Julian C, Catchen, Julian, Malhi, Ripan S, Roca, Alfred L
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 130
container_issue 2
container_start_page 120
container_title The Journal of heredity
container_volume 114
creator de Flamingh, Alida
Rivera-Colón, Angel G
Gnoske, Tom P
Kerbis Peterhans, Julian C
Catchen, Julian
Malhi, Ripan S
Roca, Alfred L
description Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.
doi_str_mv 10.1093/jhered/esac065
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2755577887</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/jhered/esac065</oup_id><sourcerecordid>2755577887</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</originalsourceid><addsrcrecordid>eNqFkT1vFDEQhi0EIpdAS4lcJsUm3vXZPtNFERCkCCigXs3OjomjXfvwBxK_hT-L4S4NDdVInsePZ_wy9qoXl72w8urhnhLNV5QBhVZP2KbfatUZKeVTthFiGLpeCXnCTnN-EEL0yorn7ERqNShl9Ib9-ljXwj9DypTe8Ota4gqFZu5nCsU7j1B8DBzCzBOt8QcsPDoeKi4Eia--RLyPYU6-NfaZ6hy_UaDMz0Pz5gvuYuKAWFOz_oM3MK7UtBhDLqni35d8aNOE0taCF-yZgyXTy2M9Y1_fvf1yc9vdfXr_4eb6rkOpbekcOWFhayZntnoYyA4Ge9xNCKafAIy0MFnrLBLOequc1tNkUBqr9I4coTxj5wfvPsXvlXIZV5-RlgUCxZrHwaj2WWa3Mw29PKCYYs6J3LhPfoX0c-zF-CeQ8RDIeAykXXh9dNdpbeeP-GMCDbg4ALHu_yf7DSdRnNk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2755577887</pqid></control><display><type>article</type><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><source>Oxford University Press Journals All Titles (1996-Current)</source><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</creator><creatorcontrib>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</creatorcontrib><description>Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</description><identifier>ISSN: 0022-1503</identifier><identifier>EISSN: 1465-7333</identifier><identifier>DOI: 10.1093/jhered/esac065</identifier><identifier>PMID: 36525576</identifier><language>eng</language><publisher>US: Oxford University Press</publisher><subject>Animals ; Cell Nucleus - genetics ; DNA, Mitochondrial - genetics ; Genome, Mitochondrial ; Panthera - genetics ; Phylogeny ; Pseudogenes ; Reproducibility of Results ; Sequence Analysis, DNA</subject><ispartof>The Journal of heredity, 2023-04, Vol.114 (2), p.120-130</ispartof><rights>The Author(s) 2022. Published by Oxford University Press on behalf of The American Genetic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press on behalf of The American Genetic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</citedby><cites>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</cites><orcidid>0000-0003-1223-6654 ; 0000-0002-4798-660X ; 0000-0001-9217-5593 ; 0000-0001-9097-3241 ; 0000-0002-1484-0292</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1578,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36525576$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>de Flamingh, Alida</creatorcontrib><creatorcontrib>Rivera-Colón, Angel G</creatorcontrib><creatorcontrib>Gnoske, Tom P</creatorcontrib><creatorcontrib>Kerbis Peterhans, Julian C</creatorcontrib><creatorcontrib>Catchen, Julian</creatorcontrib><creatorcontrib>Malhi, Ripan S</creatorcontrib><creatorcontrib>Roca, Alfred L</creatorcontrib><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><title>The Journal of heredity</title><addtitle>J Hered</addtitle><description>Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</description><subject>Animals</subject><subject>Cell Nucleus - genetics</subject><subject>DNA, Mitochondrial - genetics</subject><subject>Genome, Mitochondrial</subject><subject>Panthera - genetics</subject><subject>Phylogeny</subject><subject>Pseudogenes</subject><subject>Reproducibility of Results</subject><subject>Sequence Analysis, DNA</subject><issn>0022-1503</issn><issn>1465-7333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkT1vFDEQhi0EIpdAS4lcJsUm3vXZPtNFERCkCCigXs3OjomjXfvwBxK_hT-L4S4NDdVInsePZ_wy9qoXl72w8urhnhLNV5QBhVZP2KbfatUZKeVTthFiGLpeCXnCTnN-EEL0yorn7ERqNShl9Ib9-ljXwj9DypTe8Ota4gqFZu5nCsU7j1B8DBzCzBOt8QcsPDoeKi4Eia--RLyPYU6-NfaZ6hy_UaDMz0Pz5gvuYuKAWFOz_oM3MK7UtBhDLqni35d8aNOE0taCF-yZgyXTy2M9Y1_fvf1yc9vdfXr_4eb6rkOpbekcOWFhayZntnoYyA4Ge9xNCKafAIy0MFnrLBLOequc1tNkUBqr9I4coTxj5wfvPsXvlXIZV5-RlgUCxZrHwaj2WWa3Mw29PKCYYs6J3LhPfoX0c-zF-CeQ8RDIeAykXXh9dNdpbeeP-GMCDbg4ALHu_yf7DSdRnNk</recordid><startdate>20230406</startdate><enddate>20230406</enddate><creator>de Flamingh, Alida</creator><creator>Rivera-Colón, Angel G</creator><creator>Gnoske, Tom P</creator><creator>Kerbis Peterhans, Julian C</creator><creator>Catchen, Julian</creator><creator>Malhi, Ripan S</creator><creator>Roca, Alfred L</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1223-6654</orcidid><orcidid>https://orcid.org/0000-0002-4798-660X</orcidid><orcidid>https://orcid.org/0000-0001-9217-5593</orcidid><orcidid>https://orcid.org/0000-0001-9097-3241</orcidid><orcidid>https://orcid.org/0000-0002-1484-0292</orcidid></search><sort><creationdate>20230406</creationdate><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><author>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Animals</topic><topic>Cell Nucleus - genetics</topic><topic>DNA, Mitochondrial - genetics</topic><topic>Genome, Mitochondrial</topic><topic>Panthera - genetics</topic><topic>Phylogeny</topic><topic>Pseudogenes</topic><topic>Reproducibility of Results</topic><topic>Sequence Analysis, DNA</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>de Flamingh, Alida</creatorcontrib><creatorcontrib>Rivera-Colón, Angel G</creatorcontrib><creatorcontrib>Gnoske, Tom P</creatorcontrib><creatorcontrib>Kerbis Peterhans, Julian C</creatorcontrib><creatorcontrib>Catchen, Julian</creatorcontrib><creatorcontrib>Malhi, Ripan S</creatorcontrib><creatorcontrib>Roca, Alfred L</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>The Journal of heredity</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Flamingh, Alida</au><au>Rivera-Colón, Angel G</au><au>Gnoske, Tom P</au><au>Kerbis Peterhans, Julian C</au><au>Catchen, Julian</au><au>Malhi, Ripan S</au><au>Roca, Alfred L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</atitle><jtitle>The Journal of heredity</jtitle><addtitle>J Hered</addtitle><date>2023-04-06</date><risdate>2023</risdate><volume>114</volume><issue>2</issue><spage>120</spage><epage>130</epage><pages>120-130</pages><issn>0022-1503</issn><eissn>1465-7333</eissn><abstract>Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</abstract><cop>US</cop><pub>Oxford University Press</pub><pmid>36525576</pmid><doi>10.1093/jhered/esac065</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-1223-6654</orcidid><orcidid>https://orcid.org/0000-0002-4798-660X</orcidid><orcidid>https://orcid.org/0000-0001-9217-5593</orcidid><orcidid>https://orcid.org/0000-0001-9097-3241</orcidid><orcidid>https://orcid.org/0000-0002-1484-0292</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0022-1503
ispartof The Journal of heredity, 2023-04, Vol.114 (2), p.120-130
issn 0022-1503
1465-7333
language eng
recordid cdi_proquest_miscellaneous_2755577887
source Oxford University Press Journals All Titles (1996-Current); MEDLINE; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection
subjects Animals
Cell Nucleus - genetics
DNA, Mitochondrial - genetics
Genome, Mitochondrial
Panthera - genetics
Phylogeny
Pseudogenes
Reproducibility of Results
Sequence Analysis, DNA
title Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T23%3A05%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Numt%20Parser:%20Automated%20identification%20and%20removal%20of%20nuclear%20mitochondrial%20pseudogenes%20(numts)%20for%20accurate%20mitochondrial%20genome%20reconstruction%20in%20Panthera&rft.jtitle=The%20Journal%20of%20heredity&rft.au=de%20Flamingh,%20Alida&rft.date=2023-04-06&rft.volume=114&rft.issue=2&rft.spage=120&rft.epage=130&rft.pages=120-130&rft.issn=0022-1503&rft.eissn=1465-7333&rft_id=info:doi/10.1093/jhered/esac065&rft_dat=%3Cproquest_cross%3E2755577887%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2755577887&rft_id=info:pmid/36525576&rft_oup_id=10.1093/jhered/esac065&rfr_iscdi=true