SPDI: data model for variants and applications at NCBI
Abstract Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant ca...
Gespeichert in:
Veröffentlicht in: | BIOINFORMATICS 2020-03, Vol.36 (6), p.1902-1907 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1907 |
---|---|
container_issue | 6 |
container_start_page | 1902 |
container_title | BIOINFORMATICS |
container_volume | 36 |
creator | Holmes, J Bradley Moyer, Eric Phan, Lon Maglott, Donna Kattman, Brandi |
description | Abstract
Motivation
Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants.
Results
The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences.
Availability and implementation
The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0.
Supplementary information
Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btz856 |
format | Article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_crossref_primary_10_1093_bioinformatics_btz856</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btz856</oup_id><sourcerecordid>2315523976</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</originalsourceid><addsrcrecordid>eNqNkctu1jAQhS0Eohd4BFCWSCjUie8skCDl8ktVQQLW1sR2wCix09hpBU9fQ9pfdNeVR57vnBmdQehZg181WJGT3kcfhrhMkL1JJ33-Ixl_gA4bynHdYqYelppwUVOJyQE6SukXxqyhlD5GB6QRRFLcHCL-9cvp7nVlIUM1RevGqnhWl7B4CDlVEGwF8zx6U8bEUD5ydd692z1BjwYYk3t68x6j7x_ef-s-1WefP-66t2e1oazNdS-NpD3wQSgJQMt857gQjDChDGGcMWxtKywzVjnVGtu2YBVYawqvhCPH6M3mO6_95KxxIS8w6nnxEyy_dQSv73aC_6l_xEstWEs4lcXgxY3BEi9Wl7KefDJuHCG4uCbdkoYVVAleULahZokpLW7Yj2mw_pu5vpu53jIvuuf_77hX3YZcgJcbcOX6OCTjXTBuj-FyFiK54rJUhBZa3p_ufP53mC6uIRcp3qRxne-5_DXQebUw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2315523976</pqid></control><display><type>article</type><title>SPDI: data model for variants and applications at NCBI</title><source>Access via Oxford University Press (Open Access Collection)</source><creator>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi</creator><contributor>Wren, Jonathan</contributor><creatorcontrib>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi ; Wren, Jonathan</creatorcontrib><description>Abstract
Motivation
Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants.
Results
The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences.
Availability and implementation
The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btz856</identifier><identifier>PMID: 31738401</identifier><language>eng</language><publisher>OXFORD: Oxford University Press</publisher><subject><![CDATA[Biochemical Research Methods ; Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Computer Science ; Computer Science, Interdisciplinary Applications ; Life Sciences & Biomedicine ; Mathematical & Computational Biology ; Mathematics ; Original Papers ; Physical Sciences ; Science & Technology ; Statistics & Probability ; Technology]]></subject><ispartof>BIOINFORMATICS, 2020-03, Vol.36 (6), p.1902-1907</ispartof><rights>Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US. 2019</rights><rights>Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>23</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000538696800034</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</citedby><cites>FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</cites><orcidid>0000-0001-8354-5062</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523648/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523648/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,315,728,781,785,886,1605,27929,27930,28253,53796,53798</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btz856$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31738401$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Wren, Jonathan</contributor><creatorcontrib>Holmes, J Bradley</creatorcontrib><creatorcontrib>Moyer, Eric</creatorcontrib><creatorcontrib>Phan, Lon</creatorcontrib><creatorcontrib>Maglott, Donna</creatorcontrib><creatorcontrib>Kattman, Brandi</creatorcontrib><title>SPDI: data model for variants and applications at NCBI</title><title>BIOINFORMATICS</title><addtitle>BIOINFORMATICS</addtitle><addtitle>Bioinformatics</addtitle><description>Abstract
Motivation
Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants.
Results
The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences.
Availability and implementation
The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><subject>Biochemical Research Methods</subject><subject>Biochemistry & Molecular Biology</subject><subject>Biotechnology & Applied Microbiology</subject><subject>Computer Science</subject><subject>Computer Science, Interdisciplinary Applications</subject><subject>Life Sciences & Biomedicine</subject><subject>Mathematical & Computational Biology</subject><subject>Mathematics</subject><subject>Original Papers</subject><subject>Physical Sciences</subject><subject>Science & Technology</subject><subject>Statistics & Probability</subject><subject>Technology</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkctu1jAQhS0Eohd4BFCWSCjUie8skCDl8ktVQQLW1sR2wCix09hpBU9fQ9pfdNeVR57vnBmdQehZg181WJGT3kcfhrhMkL1JJ33-Ixl_gA4bynHdYqYelppwUVOJyQE6SukXxqyhlD5GB6QRRFLcHCL-9cvp7nVlIUM1RevGqnhWl7B4CDlVEGwF8zx6U8bEUD5ydd692z1BjwYYk3t68x6j7x_ef-s-1WefP-66t2e1oazNdS-NpD3wQSgJQMt857gQjDChDGGcMWxtKywzVjnVGtu2YBVYawqvhCPH6M3mO6_95KxxIS8w6nnxEyy_dQSv73aC_6l_xEstWEs4lcXgxY3BEi9Wl7KefDJuHCG4uCbdkoYVVAleULahZokpLW7Yj2mw_pu5vpu53jIvuuf_77hX3YZcgJcbcOX6OCTjXTBuj-FyFiK54rJUhBZa3p_ufP53mC6uIRcp3qRxne-5_DXQebUw</recordid><startdate>20200301</startdate><enddate>20200301</enddate><creator>Holmes, J Bradley</creator><creator>Moyer, Eric</creator><creator>Phan, Lon</creator><creator>Maglott, Donna</creator><creator>Kattman, Brandi</creator><general>Oxford University Press</general><general>Oxford Univ Press</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-8354-5062</orcidid></search><sort><creationdate>20200301</creationdate><title>SPDI: data model for variants and applications at NCBI</title><author>Holmes, J Bradley ; Moyer, Eric ; Phan, Lon ; Maglott, Donna ; Kattman, Brandi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-b8c84ba6f798aa4051ee67753579c356550dd27d5cd9e92cd22ad9addc98a97e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Biochemical Research Methods</topic><topic>Biochemistry & Molecular Biology</topic><topic>Biotechnology & Applied Microbiology</topic><topic>Computer Science</topic><topic>Computer Science, Interdisciplinary Applications</topic><topic>Life Sciences & Biomedicine</topic><topic>Mathematical & Computational Biology</topic><topic>Mathematics</topic><topic>Original Papers</topic><topic>Physical Sciences</topic><topic>Science & Technology</topic><topic>Statistics & Probability</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Holmes, J Bradley</creatorcontrib><creatorcontrib>Moyer, Eric</creatorcontrib><creatorcontrib>Phan, Lon</creatorcontrib><creatorcontrib>Maglott, Donna</creatorcontrib><creatorcontrib>Kattman, Brandi</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BIOINFORMATICS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Holmes, J Bradley</au><au>Moyer, Eric</au><au>Phan, Lon</au><au>Maglott, Donna</au><au>Kattman, Brandi</au><au>Wren, Jonathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SPDI: data model for variants and applications at NCBI</atitle><jtitle>BIOINFORMATICS</jtitle><stitle>BIOINFORMATICS</stitle><addtitle>Bioinformatics</addtitle><date>2020-03-01</date><risdate>2020</risdate><volume>36</volume><issue>6</issue><spage>1902</spage><epage>1907</epage><pages>1902-1907</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract
Motivation
Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants.
Results
The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the ‘Contextual Allele’. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique ‘Canonical Allele’ and is used directly to aggregate variants across congruent sequences.
Availability and implementation
The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0.
Supplementary information
Supplementary data are available at Bioinformatics online.</abstract><cop>OXFORD</cop><pub>Oxford University Press</pub><pmid>31738401</pmid><doi>10.1093/bioinformatics/btz856</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0001-8354-5062</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1367-4803 |
ispartof | BIOINFORMATICS, 2020-03, Vol.36 (6), p.1902-1907 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_crossref_primary_10_1093_bioinformatics_btz856 |
source | Access via Oxford University Press (Open Access Collection) |
subjects | Biochemical Research Methods Biochemistry & Molecular Biology Biotechnology & Applied Microbiology Computer Science Computer Science, Interdisciplinary Applications Life Sciences & Biomedicine Mathematical & Computational Biology Mathematics Original Papers Physical Sciences Science & Technology Statistics & Probability Technology |
title | SPDI: data model for variants and applications at NCBI |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T04%3A40%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SPDI:%20data%20model%20for%20variants%20and%20applications%20at%20NCBI&rft.jtitle=BIOINFORMATICS&rft.au=Holmes,%20J%20Bradley&rft.date=2020-03-01&rft.volume=36&rft.issue=6&rft.spage=1902&rft.epage=1907&rft.pages=1902-1907&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btz856&rft_dat=%3Cproquest_TOX%3E2315523976%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2315523976&rft_id=info:pmid/31738401&rft_oup_id=10.1093/bioinformatics/btz856&rfr_iscdi=true |