Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera
Abstract Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that like...
Gespeichert in:
Veröffentlicht in: | The Journal of heredity 2023-04, Vol.114 (2), p.120-130 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 130 |
---|---|
container_issue | 2 |
container_start_page | 120 |
container_title | The Journal of heredity |
container_volume | 114 |
creator | de Flamingh, Alida Rivera-Colón, Angel G Gnoske, Tom P Kerbis Peterhans, Julian C Catchen, Julian Malhi, Ripan S Roca, Alfred L |
description | Abstract
Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences. |
doi_str_mv | 10.1093/jhered/esac065 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2755577887</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/jhered/esac065</oup_id><sourcerecordid>2755577887</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</originalsourceid><addsrcrecordid>eNqFkT1vFDEQhi0EIpdAS4lcJsUm3vXZPtNFERCkCCigXs3OjomjXfvwBxK_hT-L4S4NDdVInsePZ_wy9qoXl72w8urhnhLNV5QBhVZP2KbfatUZKeVTthFiGLpeCXnCTnN-EEL0yorn7ERqNShl9Ib9-ljXwj9DypTe8Ota4gqFZu5nCsU7j1B8DBzCzBOt8QcsPDoeKi4Eia--RLyPYU6-NfaZ6hy_UaDMz0Pz5gvuYuKAWFOz_oM3MK7UtBhDLqni35d8aNOE0taCF-yZgyXTy2M9Y1_fvf1yc9vdfXr_4eb6rkOpbekcOWFhayZntnoYyA4Ge9xNCKafAIy0MFnrLBLOequc1tNkUBqr9I4coTxj5wfvPsXvlXIZV5-RlgUCxZrHwaj2WWa3Mw29PKCYYs6J3LhPfoX0c-zF-CeQ8RDIeAykXXh9dNdpbeeP-GMCDbg4ALHu_yf7DSdRnNk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2755577887</pqid></control><display><type>article</type><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><source>Oxford University Press Journals All Titles (1996-Current)</source><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</creator><creatorcontrib>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</creatorcontrib><description>Abstract
Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</description><identifier>ISSN: 0022-1503</identifier><identifier>EISSN: 1465-7333</identifier><identifier>DOI: 10.1093/jhered/esac065</identifier><identifier>PMID: 36525576</identifier><language>eng</language><publisher>US: Oxford University Press</publisher><subject>Animals ; Cell Nucleus - genetics ; DNA, Mitochondrial - genetics ; Genome, Mitochondrial ; Panthera - genetics ; Phylogeny ; Pseudogenes ; Reproducibility of Results ; Sequence Analysis, DNA</subject><ispartof>The Journal of heredity, 2023-04, Vol.114 (2), p.120-130</ispartof><rights>The Author(s) 2022. Published by Oxford University Press on behalf of The American Genetic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press on behalf of The American Genetic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</citedby><cites>FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</cites><orcidid>0000-0003-1223-6654 ; 0000-0002-4798-660X ; 0000-0001-9217-5593 ; 0000-0001-9097-3241 ; 0000-0002-1484-0292</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1578,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36525576$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>de Flamingh, Alida</creatorcontrib><creatorcontrib>Rivera-Colón, Angel G</creatorcontrib><creatorcontrib>Gnoske, Tom P</creatorcontrib><creatorcontrib>Kerbis Peterhans, Julian C</creatorcontrib><creatorcontrib>Catchen, Julian</creatorcontrib><creatorcontrib>Malhi, Ripan S</creatorcontrib><creatorcontrib>Roca, Alfred L</creatorcontrib><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><title>The Journal of heredity</title><addtitle>J Hered</addtitle><description>Abstract
Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</description><subject>Animals</subject><subject>Cell Nucleus - genetics</subject><subject>DNA, Mitochondrial - genetics</subject><subject>Genome, Mitochondrial</subject><subject>Panthera - genetics</subject><subject>Phylogeny</subject><subject>Pseudogenes</subject><subject>Reproducibility of Results</subject><subject>Sequence Analysis, DNA</subject><issn>0022-1503</issn><issn>1465-7333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkT1vFDEQhi0EIpdAS4lcJsUm3vXZPtNFERCkCCigXs3OjomjXfvwBxK_hT-L4S4NDdVInsePZ_wy9qoXl72w8urhnhLNV5QBhVZP2KbfatUZKeVTthFiGLpeCXnCTnN-EEL0yorn7ERqNShl9Ib9-ljXwj9DypTe8Ota4gqFZu5nCsU7j1B8DBzCzBOt8QcsPDoeKi4Eia--RLyPYU6-NfaZ6hy_UaDMz0Pz5gvuYuKAWFOz_oM3MK7UtBhDLqni35d8aNOE0taCF-yZgyXTy2M9Y1_fvf1yc9vdfXr_4eb6rkOpbekcOWFhayZntnoYyA4Ge9xNCKafAIy0MFnrLBLOequc1tNkUBqr9I4coTxj5wfvPsXvlXIZV5-RlgUCxZrHwaj2WWa3Mw29PKCYYs6J3LhPfoX0c-zF-CeQ8RDIeAykXXh9dNdpbeeP-GMCDbg4ALHu_yf7DSdRnNk</recordid><startdate>20230406</startdate><enddate>20230406</enddate><creator>de Flamingh, Alida</creator><creator>Rivera-Colón, Angel G</creator><creator>Gnoske, Tom P</creator><creator>Kerbis Peterhans, Julian C</creator><creator>Catchen, Julian</creator><creator>Malhi, Ripan S</creator><creator>Roca, Alfred L</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1223-6654</orcidid><orcidid>https://orcid.org/0000-0002-4798-660X</orcidid><orcidid>https://orcid.org/0000-0001-9217-5593</orcidid><orcidid>https://orcid.org/0000-0001-9097-3241</orcidid><orcidid>https://orcid.org/0000-0002-1484-0292</orcidid></search><sort><creationdate>20230406</creationdate><title>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</title><author>de Flamingh, Alida ; Rivera-Colón, Angel G ; Gnoske, Tom P ; Kerbis Peterhans, Julian C ; Catchen, Julian ; Malhi, Ripan S ; Roca, Alfred L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-fef09a47bf74622e927c1c8bca71baa739ab99f9cecd645f66bb7c379568efec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Animals</topic><topic>Cell Nucleus - genetics</topic><topic>DNA, Mitochondrial - genetics</topic><topic>Genome, Mitochondrial</topic><topic>Panthera - genetics</topic><topic>Phylogeny</topic><topic>Pseudogenes</topic><topic>Reproducibility of Results</topic><topic>Sequence Analysis, DNA</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>de Flamingh, Alida</creatorcontrib><creatorcontrib>Rivera-Colón, Angel G</creatorcontrib><creatorcontrib>Gnoske, Tom P</creatorcontrib><creatorcontrib>Kerbis Peterhans, Julian C</creatorcontrib><creatorcontrib>Catchen, Julian</creatorcontrib><creatorcontrib>Malhi, Ripan S</creatorcontrib><creatorcontrib>Roca, Alfred L</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>The Journal of heredity</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>de Flamingh, Alida</au><au>Rivera-Colón, Angel G</au><au>Gnoske, Tom P</au><au>Kerbis Peterhans, Julian C</au><au>Catchen, Julian</au><au>Malhi, Ripan S</au><au>Roca, Alfred L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera</atitle><jtitle>The Journal of heredity</jtitle><addtitle>J Hered</addtitle><date>2023-04-06</date><risdate>2023</risdate><volume>114</volume><issue>2</issue><spage>120</spage><epage>130</epage><pages>120-130</pages><issn>0022-1503</issn><eissn>1465-7333</eissn><abstract>Abstract
Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from 2 ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to 2 other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.</abstract><cop>US</cop><pub>Oxford University Press</pub><pmid>36525576</pmid><doi>10.1093/jhered/esac065</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-1223-6654</orcidid><orcidid>https://orcid.org/0000-0002-4798-660X</orcidid><orcidid>https://orcid.org/0000-0001-9217-5593</orcidid><orcidid>https://orcid.org/0000-0001-9097-3241</orcidid><orcidid>https://orcid.org/0000-0002-1484-0292</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-1503 |
ispartof | The Journal of heredity, 2023-04, Vol.114 (2), p.120-130 |
issn | 0022-1503 1465-7333 |
language | eng |
recordid | cdi_proquest_miscellaneous_2755577887 |
source | Oxford University Press Journals All Titles (1996-Current); MEDLINE; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection |
subjects | Animals Cell Nucleus - genetics DNA, Mitochondrial - genetics Genome, Mitochondrial Panthera - genetics Phylogeny Pseudogenes Reproducibility of Results Sequence Analysis, DNA |
title | Numt Parser: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T23%3A05%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Numt%20Parser:%20Automated%20identification%20and%20removal%20of%20nuclear%20mitochondrial%20pseudogenes%20(numts)%20for%20accurate%20mitochondrial%20genome%20reconstruction%20in%20Panthera&rft.jtitle=The%20Journal%20of%20heredity&rft.au=de%20Flamingh,%20Alida&rft.date=2023-04-06&rft.volume=114&rft.issue=2&rft.spage=120&rft.epage=130&rft.pages=120-130&rft.issn=0022-1503&rft.eissn=1465-7333&rft_id=info:doi/10.1093/jhered/esac065&rft_dat=%3Cproquest_cross%3E2755577887%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2755577887&rft_id=info:pmid/36525576&rft_oup_id=10.1093/jhered/esac065&rfr_iscdi=true |