Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data

Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLoS computational biology 2019-09, Vol.15 (9), p.e1007349-e1007349
Hauptverfasser:	Hicks, Allison L, Wheeler, Nicole, Sánchez-Busó, Leonor, Rakeman, Jennifer L, Harris, Simon R, Grad, Yonatan H
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Anti-Bacterial Agents - pharmacology Antibiotic resistance Antibiotics Antimicrobial agents Artificial intelligence Bacteria - drug effects Bacteria - genetics Bacterial Infections - microbiology Bioinformatics Biology and Life Sciences Classification Computational Biology Computer and Information Sciences Databases, Genetic Datasets DNA sequencing Drug resistance Drug therapy Epidemiology Evaluation Gene sequencing Genetic aspects Genome, Bacterial - genetics Genomes Genomics Genotype & phenotype Genotypes Gonorrhea Health surveillance Humans Immunology Infectious diseases Klebsiella Laboratories Learning algorithms Logos Machine Learning Medicine and Health Sciences Meta-analysis Methods Microbial drug resistance Microbial Sensitivity Tests - methods Nucleotide sequence Pathogenic microorganisms People and places Phenotypes Physical Sciences Pneumonia Populations Public health Regression analysis Regression models Reliability analysis Reproducibility of Results Research and Analysis Methods Sustainability Sustainable development Whole Genome Sequencing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1007349