Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks

Abstract Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Briefings in bioinformatics 2022-09, Vol.23 (5)
Hauptverfasser: Su, Junhao, Zheng, Zhenxian, Ahmed, Syed Shakeel, Lam, Tak-Wah, Luo, Ruibang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 5
container_start_page
container_title Briefings in bioinformatics
container_volume 23
creator Su, Junhao
Zheng, Zhenxian
Ahmed, Syed Shakeel
Lam, Tak-Wah
Luo, Ruibang
description Abstract Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.
doi_str_mv 10.1093/bib/bbac301
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487642</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac301</oup_id><sourcerecordid>2691460306</sourcerecordid><originalsourceid>FETCH-LOGICAL-c370t-2559013ba971518237ce644c28f34f1db790215911bacec259b0631fba3ad4a53</originalsourceid><addsrcrecordid>eNp9kc9rFDEUx4Motq6evEtAEEFi83My8SDIorVQ9KLnkGQyu6kzyZjMdOl_b7a7luqhp_cgHz55730BeEnwe4IVO7PBnllrHMPkETglXErEseCP930jkeANOwHPSrnCmGLZkqfghImWK4LZKditBxMyQ3MO6QPchs0WTT73KY8mOg-_mZimlD0cUtyg7E0Hr00OJs7QmWEIcQNDhL0Zw3AD944Cd2He3rZoTrda2Hk_weiXbIZa5l3Kv8pz8KQ3Q_EvjnUFfn75_GP9FV1-P79Yf7pEjkk8IyqEwoRZoyQRpKVMOt9w7mjbM96TzkqFKRGKkHoA76hQFjeM9NYw03Ej2Ap8PHinxY6-cz7OdQw95TCafKOTCfrflxi2epOuteKtbDitgrdHQU6_F19mPYbi_DCY6NNSNG1UPTNm9dsVeP0fepWWHOt6mkoiWSMlk5V6d6BcTqVk398NQ7DeB6proPoYaKVf3Z__jv2bYAXeHIC0TA-a_gB-z6sx</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717367737</pqid></control><display><type>article</type><title>Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Business Source Complete</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><creator>Su, Junhao ; Zheng, Zhenxian ; Ahmed, Syed Shakeel ; Lam, Tak-Wah ; Luo, Ruibang</creator><creatorcontrib>Su, Junhao ; Zheng, Zhenxian ; Ahmed, Syed Shakeel ; Lam, Tak-Wah ; Luo, Ruibang</creatorcontrib><description>Abstract Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.</description><identifier>ISSN: 1467-5463</identifier><identifier>ISSN: 1477-4054</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac301</identifier><identifier>PMID: 35849103</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Artificial neural networks ; Genetic diversity ; Genetic variance ; Genomics - methods ; Heredity ; High-Throughput Nucleotide Sequencing - methods ; Humans ; Inheritances ; Nanopores ; Neural networks ; Neural Networks, Computer ; Problem Solving Protocol ; Sequence Analysis, DNA ; Software</subject><ispartof>Briefings in bioinformatics, 2022-09, Vol.23 (5)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c370t-2559013ba971518237ce644c28f34f1db790215911bacec259b0631fba3ad4a53</citedby><cites>FETCH-LOGICAL-c370t-2559013ba971518237ce644c28f34f1db790215911bacec259b0631fba3ad4a53</cites><orcidid>0000-0002-8560-3999 ; 0000-0002-6546-2324 ; 0000-0001-9711-6533 ; 0000-0003-4676-8587</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487642/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487642/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1603,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35849103$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Su, Junhao</creatorcontrib><creatorcontrib>Zheng, Zhenxian</creatorcontrib><creatorcontrib>Ahmed, Syed Shakeel</creatorcontrib><creatorcontrib>Lam, Tak-Wah</creatorcontrib><creatorcontrib>Luo, Ruibang</creatorcontrib><title>Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.</description><subject>Artificial neural networks</subject><subject>Genetic diversity</subject><subject>Genetic variance</subject><subject>Genomics - methods</subject><subject>Heredity</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Humans</subject><subject>Inheritances</subject><subject>Nanopores</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Problem Solving Protocol</subject><subject>Sequence Analysis, DNA</subject><subject>Software</subject><issn>1467-5463</issn><issn>1477-4054</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNp9kc9rFDEUx4Motq6evEtAEEFi83My8SDIorVQ9KLnkGQyu6kzyZjMdOl_b7a7luqhp_cgHz55730BeEnwe4IVO7PBnllrHMPkETglXErEseCP930jkeANOwHPSrnCmGLZkqfghImWK4LZKditBxMyQ3MO6QPchs0WTT73KY8mOg-_mZimlD0cUtyg7E0Hr00OJs7QmWEIcQNDhL0Zw3AD944Cd2He3rZoTrda2Hk_weiXbIZa5l3Kv8pz8KQ3Q_EvjnUFfn75_GP9FV1-P79Yf7pEjkk8IyqEwoRZoyQRpKVMOt9w7mjbM96TzkqFKRGKkHoA76hQFjeM9NYw03Ej2Ap8PHinxY6-cz7OdQw95TCafKOTCfrflxi2epOuteKtbDitgrdHQU6_F19mPYbi_DCY6NNSNG1UPTNm9dsVeP0fepWWHOt6mkoiWSMlk5V6d6BcTqVk398NQ7DeB6proPoYaKVf3Z__jv2bYAXeHIC0TA-a_gB-z6sx</recordid><startdate>20220920</startdate><enddate>20220920</enddate><creator>Su, Junhao</creator><creator>Zheng, Zhenxian</creator><creator>Ahmed, Syed Shakeel</creator><creator>Lam, Tak-Wah</creator><creator>Luo, Ruibang</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-8560-3999</orcidid><orcidid>https://orcid.org/0000-0002-6546-2324</orcidid><orcidid>https://orcid.org/0000-0001-9711-6533</orcidid><orcidid>https://orcid.org/0000-0003-4676-8587</orcidid></search><sort><creationdate>20220920</creationdate><title>Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks</title><author>Su, Junhao ; Zheng, Zhenxian ; Ahmed, Syed Shakeel ; Lam, Tak-Wah ; Luo, Ruibang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c370t-2559013ba971518237ce644c28f34f1db790215911bacec259b0631fba3ad4a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial neural networks</topic><topic>Genetic diversity</topic><topic>Genetic variance</topic><topic>Genomics - methods</topic><topic>Heredity</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Humans</topic><topic>Inheritances</topic><topic>Nanopores</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Problem Solving Protocol</topic><topic>Sequence Analysis, DNA</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Su, Junhao</creatorcontrib><creatorcontrib>Zheng, Zhenxian</creatorcontrib><creatorcontrib>Ahmed, Syed Shakeel</creatorcontrib><creatorcontrib>Lam, Tak-Wah</creatorcontrib><creatorcontrib>Luo, Ruibang</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Su, Junhao</au><au>Zheng, Zhenxian</au><au>Ahmed, Syed Shakeel</au><au>Lam, Tak-Wah</au><au>Luo, Ruibang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2022-09-20</date><risdate>2022</risdate><volume>23</volume><issue>5</issue><issn>1467-5463</issn><issn>1477-4054</issn><eissn>1477-4054</eissn><abstract>Abstract Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35849103</pmid><doi>10.1093/bib/bbac301</doi><orcidid>https://orcid.org/0000-0002-8560-3999</orcidid><orcidid>https://orcid.org/0000-0002-6546-2324</orcidid><orcidid>https://orcid.org/0000-0001-9711-6533</orcidid><orcidid>https://orcid.org/0000-0003-4676-8587</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1467-5463
ispartof Briefings in bioinformatics, 2022-09, Vol.23 (5)
issn 1467-5463
1477-4054
1477-4054
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487642
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Business Source Complete; Oxford Journals Open Access Collection; PubMed Central
subjects Artificial neural networks
Genetic diversity
Genetic variance
Genomics - methods
Heredity
High-Throughput Nucleotide Sequencing - methods
Humans
Inheritances
Nanopores
Neural networks
Neural Networks, Computer
Problem Solving Protocol
Sequence Analysis, DNA
Software
title Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T16%3A46%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clair3-trio:%20high-performance%20Nanopore%20long-read%20variant%20calling%20in%20family%20trios%20with%20trio-to-trio%20deep%20neural%20networks&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Su,%20Junhao&rft.date=2022-09-20&rft.volume=23&rft.issue=5&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac301&rft_dat=%3Cproquest_pubme%3E2691460306%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717367737&rft_id=info:pmid/35849103&rft_oup_id=10.1093/bib/bbac301&rfr_iscdi=true