Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood

DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Planta 2019-05, Vol.249 (5), p.1617-1625
Hauptverfasser: He, Tuo, Jiao, Lichao, Wiedenhoeft, Alex C., Yin, Yafang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1625
container_issue 5
container_start_page 1617
container_title Planta
container_volume 249
creator He, Tuo
Jiao, Lichao
Wiedenhoeft, Alex C.
Yin, Yafang
description DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.
doi_str_mv 10.1007/s00425-019-03116-3
format Article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_2187534280</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>48702082</jstor_id><sourcerecordid>48702082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-2b6c8ec9c0c8b331b511c1d872d9cf803fdde16752dfa69429457ece4e413e93</originalsourceid><addsrcrecordid>eNp9kc1u1TAQRq0K1N7e9gUqgSyx6cYw_knsLKvSAlKBLrq3HHvS5uomDnYixNvjNqVILFjZ8pw5Y_sj5IzDew6gP2QAJSoGvGEgOa-ZPCAbrqRgApR5RTYAZQ-NrI7Icc47gFLU-pAcSTCiAjAbMnx1_qEfke7RpbEf76mbphTLIWYal3nC1MU00NDn2Y0eGXVjoHNCZK3LGOiA80MMmRaKfvx2QVuXfAyPotjR2xlT9C5NS6Y_Ywwn5HXn9hlPn9ctubu-urv8zG6-f_pyeXHDvGz0zERbe4O-8eBNKyVvK849D0aL0PjOgOxCQF7rSoTO1Y0Sjao0elSouMRGbsn5qi0v-bFgnu3QZ4_7vRsxLtkKbnQllSimLXn3D7qLSxrL5Z4ozo2CulBipXyKOSfs7JT6waVfloN9zMKuWdiShX3KwsrS9PZZvbQDhpeWP59fALkCuZTGe0x_Z_9X-2bt2uU5pherMhpEMcvf34CeHw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2187118406</pqid></control><display><type>article</type><title>Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood</title><source>Jstor Complete Legacy</source><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><creator>He, Tuo ; Jiao, Lichao ; Wiedenhoeft, Alex C. ; Yin, Yafang</creator><creatorcontrib>He, Tuo ; Jiao, Lichao ; Wiedenhoeft, Alex C. ; Yin, Yafang</creatorcontrib><description>DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.</description><identifier>ISSN: 0032-0935</identifier><identifier>EISSN: 1432-2048</identifier><identifier>DOI: 10.1007/s00425-019-03116-3</identifier><identifier>PMID: 30825008</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Science + Business Media</publisher><subject>Accuracy ; Agriculture ; Analytical methods ; Artificial intelligence ; Bar codes ; Biomedical and Life Sciences ; Biotechnology ; Classifiers ; Cost analysis ; Data processing ; Deoxyribonucleic acid ; DNA ; DNA Barcoding, Taxonomic - methods ; Ecology ; Forensic science ; Forestry ; Gene sequencing ; Identification methods ; Learning algorithms ; Life Sciences ; Logging ; Machine Learning ; Neural networks ; Nucleotide sequence ; ORIGINAL ARTICLE ; Plant Sciences ; Pterocarpus ; Pterocarpus - genetics ; Sequence Analysis, DNA ; Species ; Witnesses ; Wood ; Wood - genetics</subject><ispartof>Planta, 2019-05, Vol.249 (5), p.1617-1625</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Planta is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-2b6c8ec9c0c8b331b511c1d872d9cf803fdde16752dfa69429457ece4e413e93</citedby><cites>FETCH-LOGICAL-c397t-2b6c8ec9c0c8b331b511c1d872d9cf803fdde16752dfa69429457ece4e413e93</cites><orcidid>0000-0001-9031-9928</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/48702082$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/48702082$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,27901,27902,41464,42533,51294,57992,58225</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30825008$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Tuo</creatorcontrib><creatorcontrib>Jiao, Lichao</creatorcontrib><creatorcontrib>Wiedenhoeft, Alex C.</creatorcontrib><creatorcontrib>Yin, Yafang</creatorcontrib><title>Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood</title><title>Planta</title><addtitle>Planta</addtitle><addtitle>Planta</addtitle><description>DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.</description><subject>Accuracy</subject><subject>Agriculture</subject><subject>Analytical methods</subject><subject>Artificial intelligence</subject><subject>Bar codes</subject><subject>Biomedical and Life Sciences</subject><subject>Biotechnology</subject><subject>Classifiers</subject><subject>Cost analysis</subject><subject>Data processing</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA Barcoding, Taxonomic - methods</subject><subject>Ecology</subject><subject>Forensic science</subject><subject>Forestry</subject><subject>Gene sequencing</subject><subject>Identification methods</subject><subject>Learning algorithms</subject><subject>Life Sciences</subject><subject>Logging</subject><subject>Machine Learning</subject><subject>Neural networks</subject><subject>Nucleotide sequence</subject><subject>ORIGINAL ARTICLE</subject><subject>Plant Sciences</subject><subject>Pterocarpus</subject><subject>Pterocarpus - genetics</subject><subject>Sequence Analysis, DNA</subject><subject>Species</subject><subject>Witnesses</subject><subject>Wood</subject><subject>Wood - genetics</subject><issn>0032-0935</issn><issn>1432-2048</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kc1u1TAQRq0K1N7e9gUqgSyx6cYw_knsLKvSAlKBLrq3HHvS5uomDnYixNvjNqVILFjZ8pw5Y_sj5IzDew6gP2QAJSoGvGEgOa-ZPCAbrqRgApR5RTYAZQ-NrI7Icc47gFLU-pAcSTCiAjAbMnx1_qEfke7RpbEf76mbphTLIWYal3nC1MU00NDn2Y0eGXVjoHNCZK3LGOiA80MMmRaKfvx2QVuXfAyPotjR2xlT9C5NS6Y_Ywwn5HXn9hlPn9ctubu-urv8zG6-f_pyeXHDvGz0zERbe4O-8eBNKyVvK849D0aL0PjOgOxCQF7rSoTO1Y0Sjao0elSouMRGbsn5qi0v-bFgnu3QZ4_7vRsxLtkKbnQllSimLXn3D7qLSxrL5Z4ozo2CulBipXyKOSfs7JT6waVfloN9zMKuWdiShX3KwsrS9PZZvbQDhpeWP59fALkCuZTGe0x_Z_9X-2bt2uU5pherMhpEMcvf34CeHw</recordid><startdate>20190501</startdate><enddate>20190501</enddate><creator>He, Tuo</creator><creator>Jiao, Lichao</creator><creator>Wiedenhoeft, Alex C.</creator><creator>Yin, Yafang</creator><general>Springer Science + Business Media</general><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QP</scope><scope>7QR</scope><scope>7TM</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-9031-9928</orcidid></search><sort><creationdate>20190501</creationdate><title>Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood</title><author>He, Tuo ; Jiao, Lichao ; Wiedenhoeft, Alex C. ; Yin, Yafang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-2b6c8ec9c0c8b331b511c1d872d9cf803fdde16752dfa69429457ece4e413e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accuracy</topic><topic>Agriculture</topic><topic>Analytical methods</topic><topic>Artificial intelligence</topic><topic>Bar codes</topic><topic>Biomedical and Life Sciences</topic><topic>Biotechnology</topic><topic>Classifiers</topic><topic>Cost analysis</topic><topic>Data processing</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA Barcoding, Taxonomic - methods</topic><topic>Ecology</topic><topic>Forensic science</topic><topic>Forestry</topic><topic>Gene sequencing</topic><topic>Identification methods</topic><topic>Learning algorithms</topic><topic>Life Sciences</topic><topic>Logging</topic><topic>Machine Learning</topic><topic>Neural networks</topic><topic>Nucleotide sequence</topic><topic>ORIGINAL ARTICLE</topic><topic>Plant Sciences</topic><topic>Pterocarpus</topic><topic>Pterocarpus - genetics</topic><topic>Sequence Analysis, DNA</topic><topic>Species</topic><topic>Witnesses</topic><topic>Wood</topic><topic>Wood - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>He, Tuo</creatorcontrib><creatorcontrib>Jiao, Lichao</creatorcontrib><creatorcontrib>Wiedenhoeft, Alex C.</creatorcontrib><creatorcontrib>Yin, Yafang</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Planta</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>He, Tuo</au><au>Jiao, Lichao</au><au>Wiedenhoeft, Alex C.</au><au>Yin, Yafang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood</atitle><jtitle>Planta</jtitle><stitle>Planta</stitle><addtitle>Planta</addtitle><date>2019-05-01</date><risdate>2019</risdate><volume>249</volume><issue>5</issue><spage>1617</spage><epage>1625</epage><pages>1617-1625</pages><issn>0032-0935</issn><eissn>1432-2048</eissn><abstract>DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Science + Business Media</pub><pmid>30825008</pmid><doi>10.1007/s00425-019-03116-3</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0001-9031-9928</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0032-0935
ispartof Planta, 2019-05, Vol.249 (5), p.1617-1625
issn 0032-0935
1432-2048
language eng
recordid cdi_proquest_miscellaneous_2187534280
source Jstor Complete Legacy; MEDLINE; Springer Nature - Complete Springer Journals
subjects Accuracy
Agriculture
Analytical methods
Artificial intelligence
Bar codes
Biomedical and Life Sciences
Biotechnology
Classifiers
Cost analysis
Data processing
Deoxyribonucleic acid
DNA
DNA Barcoding, Taxonomic - methods
Ecology
Forensic science
Forestry
Gene sequencing
Identification methods
Learning algorithms
Life Sciences
Logging
Machine Learning
Neural networks
Nucleotide sequence
ORIGINAL ARTICLE
Plant Sciences
Pterocarpus
Pterocarpus - genetics
Sequence Analysis, DNA
Species
Witnesses
Wood
Wood - genetics
title Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T15%3A03%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning%20approaches%20outperform%20distance-%20and%20tree-based%20methods%20for%20DNA%20barcoding%20of%20Pterocarpus%20wood&rft.jtitle=Planta&rft.au=He,%20Tuo&rft.date=2019-05-01&rft.volume=249&rft.issue=5&rft.spage=1617&rft.epage=1625&rft.pages=1617-1625&rft.issn=0032-0935&rft.eissn=1432-2048&rft_id=info:doi/10.1007/s00425-019-03116-3&rft_dat=%3Cjstor_proqu%3E48702082%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2187118406&rft_id=info:pmid/30825008&rft_jstor_id=48702082&rfr_iscdi=true