SPDI: data model for variants and applications at NCBI

Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant ca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BIOINFORMATICS 2020-03, Vol.36 (6), p.1902-1907
Hauptverfasser: Holmes, J Bradley, Moyer, Eric, Phan, Lon, Maglott, Donna, Kattman, Brandi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1907
container_issue 6
container_start_page 1902
container_title BIOINFORMATICS
container_volume 36
creator Holmes, J Bradley
Moyer, Eric
Phan, Lon
Maglott, Donna
Kattman, Brandi
description Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. Results The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences. Availability and implementation The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btz856
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_crossref_primary_10_1093_bioinformatics_btz856</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btz856</oup_id><sourcerecordid>2315523976</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</originalsourceid><addsrcrecordid>eNqNkctu1jAQhS0Eohd4BFCWSCjUie8skCDl8ktVQQLW1sR2wCix09hpBU9fQ9pfdNeVR57vnBmdQehZg181WJGT3kcfhrhMkL1JJ33-Ixl_gA4bynHdYqYelppwUVOJyQE6SukXxqyhlD5GB6QRRFLcHCL-9cvp7nVlIUM1RevGqnhWl7B4CDlVEGwF8zx6U8bEUD5ydd692z1BjwYYk3t68x6j7x_ef-s-1WefP-66t2e1oazNdS-NpD3wQSgJQMt857gQjDChDGGcMWxtKywzVjnVGtu2YBVYawqvhCPH6M3mO6_95KxxIS8w6nnxEyy_dQSv73aC_6l_xEstWEs4lcXgxY3BEi9Wl7KefDJuHCG4uCbdkoYVVAleULahZokpLW7Yj2mw_pu5vpu53jIvuuf_77hX3YZcgJcbcOX6OCTjXTBuj-FyFiK54rJUhBZa3p_ufP53mC6uIRcp3qRxne-5_DXQebUw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2315523976</pqid></control><display><type>article</type><title>SPDI: data model for variants and applications at NCBI</title><source>Access via Oxford University Press (Open Access Collection)</source><creator>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi</creator><contributor>Wren, Jonathan</contributor><creatorcontrib>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi ; Wren, Jonathan</creatorcontrib><description>Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. Results The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences. Availability and implementation The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btz856</identifier><identifier>PMID: 31738401</identifier><language>eng</language><publisher>OXFORD: Oxford University Press</publisher><subject><![CDATA[Biochemical Research Methods ; Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Computer Science ; Computer Science, Interdisciplinary Applications ; Life Sciences & Biomedicine ; Mathematical & Computational Biology ; Mathematics ; Original Papers ; Physical Sciences ; Science & Technology ; Statistics & Probability ; Technology]]></subject><ispartof>BIOINFORMATICS, 2020-03, Vol.36 (6), p.1902-1907</ispartof><rights>Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US. 2019</rights><rights>Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>23</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000538696800034</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</citedby><cites>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</cites><orcidid>0000-0001-8354-5062</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523648/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523648/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,315,728,781,785,886,1605,27929,27930,28253,53796,53798</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btz856$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31738401$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Wren, Jonathan</contributor><creatorcontrib>Holmes, J Bradley</creatorcontrib><creatorcontrib>Moyer, Eric</creatorcontrib><creatorcontrib>Phan, Lon</creatorcontrib><creatorcontrib>Maglott, Donna</creatorcontrib><creatorcontrib>Kattman, Brandi</creatorcontrib><title>SPDI: data model for variants and applications at NCBI</title><title>BIOINFORMATICS</title><addtitle>BIOINFORMATICS</addtitle><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. Results The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences. Availability and implementation The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Biochemical Research Methods</subject><subject>Biochemistry &amp; Molecular Biology</subject><subject>Biotechnology &amp; Applied Microbiology</subject><subject>Computer Science</subject><subject>Computer Science, Interdisciplinary Applications</subject><subject>Life Sciences &amp; Biomedicine</subject><subject>Mathematical &amp; Computational Biology</subject><subject>Mathematics</subject><subject>Original Papers</subject><subject>Physical Sciences</subject><subject>Science &amp; Technology</subject><subject>Statistics &amp; Probability</subject><subject>Technology</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkctu1jAQhS0Eohd4BFCWSCjUie8skCDl8ktVQQLW1sR2wCix09hpBU9fQ9pfdNeVR57vnBmdQehZg181WJGT3kcfhrhMkL1JJ33-Ixl_gA4bynHdYqYelppwUVOJyQE6SukXxqyhlD5GB6QRRFLcHCL-9cvp7nVlIUM1RevGqnhWl7B4CDlVEGwF8zx6U8bEUD5ydd692z1BjwYYk3t68x6j7x_ef-s-1WefP-66t2e1oazNdS-NpD3wQSgJQMt857gQjDChDGGcMWxtKywzVjnVGtu2YBVYawqvhCPH6M3mO6_95KxxIS8w6nnxEyy_dQSv73aC_6l_xEstWEs4lcXgxY3BEi9Wl7KefDJuHCG4uCbdkoYVVAleULahZokpLW7Yj2mw_pu5vpu53jIvuuf_77hX3YZcgJcbcOX6OCTjXTBuj-FyFiK54rJUhBZa3p_ufP53mC6uIRcp3qRxne-5_DXQebUw</recordid><startdate>20200301</startdate><enddate>20200301</enddate><creator>Holmes, J Bradley</creator><creator>Moyer, Eric</creator><creator>Phan, Lon</creator><creator>Maglott, Donna</creator><creator>Kattman, Brandi</creator><general>Oxford University Press</general><general>Oxford Univ Press</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8354-5062</orcidid></search><sort><creationdate>20200301</creationdate><title>SPDI: data model for variants and applications at NCBI</title><author>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Biochemical Research Methods</topic><topic>Biochemistry &amp; Molecular Biology</topic><topic>Biotechnology &amp; Applied Microbiology</topic><topic>Computer Science</topic><topic>Computer Science, Interdisciplinary Applications</topic><topic>Life Sciences &amp; Biomedicine</topic><topic>Mathematical &amp; Computational Biology</topic><topic>Mathematics</topic><topic>Original Papers</topic><topic>Physical Sciences</topic><topic>Science &amp; Technology</topic><topic>Statistics &amp; Probability</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Holmes, J Bradley</creatorcontrib><creatorcontrib>Moyer, Eric</creatorcontrib><creatorcontrib>Phan, Lon</creatorcontrib><creatorcontrib>Maglott, Donna</creatorcontrib><creatorcontrib>Kattman, Brandi</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BIOINFORMATICS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Holmes, J Bradley</au><au>Moyer, Eric</au><au>Phan, Lon</au><au>Maglott, Donna</au><au>Kattman, Brandi</au><au>Wren, Jonathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPDI: data model for variants and applications at NCBI</atitle><jtitle>BIOINFORMATICS</jtitle><stitle>BIOINFORMATICS</stitle><addtitle>Bioinformatics</addtitle><date>2020-03-01</date><risdate>2020</risdate><volume>36</volume><issue>6</issue><spage>1902</spage><epage>1907</epage><pages>1902-1907</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. Results The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences. Availability and implementation The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>OXFORD</cop><pub>Oxford University Press</pub><pmid>31738401</pmid><doi>10.1093/bioinformatics/btz856</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0001-8354-5062</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof BIOINFORMATICS, 2020-03, Vol.36 (6), p.1902-1907
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_crossref_primary_10_1093_bioinformatics_btz856
source Access via Oxford University Press (Open Access Collection)
subjects Biochemical Research Methods
Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Computer Science
Computer Science, Interdisciplinary Applications
Life Sciences & Biomedicine
Mathematical & Computational Biology
Mathematics
Original Papers
Physical Sciences
Science & Technology
Statistics & Probability
Technology
title SPDI: data model for variants and applications at NCBI
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T04%3A40%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPDI:%20data%20model%20for%20variants%20and%20applications%20at%20NCBI&rft.jtitle=BIOINFORMATICS&rft.au=Holmes,%20J%20Bradley&rft.date=2020-03-01&rft.volume=36&rft.issue=6&rft.spage=1902&rft.epage=1907&rft.pages=1902-1907&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btz856&rft_dat=%3Cproquest_TOX%3E2315523976%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2315523976&rft_id=info:pmid/31738401&rft_oup_id=10.1093/bioinformatics/btz856&rfr_iscdi=true