Improved Prediction of CYP-Mediated Metabolism with Chemical Fingerprints

Molecule and atom fingerprints, similar to path-based Daylight fingerprints, can substantially improve the accuracy of P450 site-of-metabolism prediction models. Only two chemical fingerprints have been used in metabolism prediction, so little is known about the importance of fingerprint parameters...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2015-05, Vol.55 (5), p.972-982
Hauptverfasser: Zaretzki, Jed, Boehm, Kevin M, Swamidass, S. Joshua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Molecule and atom fingerprints, similar to path-based Daylight fingerprints, can substantially improve the accuracy of P450 site-of-metabolism prediction models. Only two chemical fingerprints have been used in metabolism prediction, so little is known about the importance of fingerprint parameters on site of metabolism predictions. It is possible that different fingerprints might yield more accurate models. Here, we study if tuning fingerprints to specific site of metabolism data sets can lead to improved models. We measure the impact of 484 specific chemical fingerprints on the accuracy of P450 site-of-metabolism prediction models on nine P450 isoform site of metabolism data sets. Using a range of search depths, we study path, circular, and subgraph fingerprints. Two different labelings, also, are considered, both standard SMILES labels and also a labeling that marks ring bonds differently than nonring bonds, enabling ortho, para, and meta positioning of substituents to be more clearly encoded. Optimal fingerprint models chosen by cross-validation performance on the full training data are, on average, 3.8% (Top-2; percent of molecules with a site of metabolism in the top two predictions) and 1.4% (AUC; area under the ROC curve) more accurate than base fingerprint models. These gains represent, respectively, a 25.6% and 16.7% reduction in error. A more rigorous assessment selects fingerprints within each cross-validation fold, sometimes selecting different fingerprints for different folds, but yielding a more reliable estimate of generalization error. In this assessment, averaging the scores from the top few fingerprints yields performances improvements of, on average, 3.0% (Top-2) and 0.7% (AUC). These gains are statistically significant and represent, respectively, a 20.1% and 8.8% reduction in error. Between different isoforms, not many consistencies were observed among the top performing fingerprints, with different fingerprints working best for different isoforms. These results suggest that there are important gains achievable in site of metabolism modeling by including and optimizing atom and molecule fingerprints. The optimal site of metabolism models determined by this approach are available for use at http://swami.wustl.edu/.
ISSN:1549-9596
1549-960X
DOI:10.1021/ci5005652