hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update
The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of var...
Gespeichert in:
Veröffentlicht in: | Human mutation 2018-12, Vol.39 (12), p.1803-1813 |
---|---|
Hauptverfasser: | , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.
After the one‐line initialization of hgvs (), NC_000017.11:g.43091687delC (rs397509113) is parsed into a structured object (), var_g, and normalized (3' shifted) according HGVS recommendations (). For each of the six relevant transcripts that span the genome region (), the genomic variant is projected onto the transcript sequence (). Inferred protein variants are generated for each of the five coding transcripts; the genomic variant is within an exon of three of the coding transcripts and within an intron for the other two. Note that transcript and protein accession pairs are consistent with the correspondence defined by NCBI. |
---|---|
ISSN: | 1059-7794 1098-1004 1098-1004 |
DOI: | 10.1002/humu.23615 |