MACHINE LEARNING MODEL FOR RECALIBRATING GENOTYPE CALLS FROM EXISTING SEQUENCING DATA FILES

This disclosure describes methods, non-transitory computer readable media, and systems that can utilize a machine learning model to recalibrate genotype calls (e.g., variant calls) of existing sequencing data files. For instance, the disclosed systems the disclosed systems can access one or more exi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mehio, Rami, Visvanath, Arun, Parnaby, Gavin Derek, De Beer, Jacobus, Huang, Zhuoyi
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This disclosure describes methods, non-transitory computer readable media, and systems that can utilize a machine learning model to recalibrate genotype calls (e.g., variant calls) of existing sequencing data files. For instance, the disclosed systems the disclosed systems can access one or more existing sequencing data files for a genomic sample, where the files include nucleotide-read data and genotype calls at particular genomic coordinate. From the one or more existing sequencing data files, the disclosed system extracts sequencing metrics for nucleotide reads or a particular genotype call at a particular genomic coordinate. By processing the extracted sequencing metrics, the systems further utilize a call-recalibration-machine-learning model to generate variant-call classifications indicating an accuracy of the particular genotype call. In some cases, the systems update or recalibrate the genotype call or quality-measuring sequencing metrics for the genotype call based on the variant-call classifications.