vEXP: A virtual enhanced cross screen panel for off-target pharmacology alerts
We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerti...
Gespeichert in:
Veröffentlicht in: | Computational toxicology 2024-09, Vol.31, p.100324, Article 100324 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We describe the development of the GSK vEXP (virtual enhanced cross screen panel) for off-target pharmacology alerts. The derivation of a panel of machine learning classification models or QSAR models (Quantitative Structure-Activity Relationship) for off-target safety assessment allows early alerting to risk factors in candidate drugs. The models are matched to an internal in-vitro biochemical screening panel described previously with some updates reported here. The extreme imbalance of some internal GSK datasets and most of the related external ChEMBL datasets is shown when considering potency thresholds relevant to in-vitro screening. The small size and bias to the active class make many ChEMBL datasets un-modellable using such thresholds. Although larger, many GSK datasets remain too imbalanced to give a performant model. The value of merging internal and external data to help rebalance datasets and improve the domain of applicability is demonstrated with improvements in model performance frequently seen on merged data. Efforts to collate public datasets with a far better balance of the missing in-actives would likely do more to improve opensource models than simply increasing dataset size. We investigate the use of moving the probability threshold and applying imbalanced learners to help overcome the imbalance problem. Both methods can produce models with improved performance when applied to imbalanced datasets. Datasets with class imbalance 95:5 % or with |
---|---|
ISSN: | 2468-1113 2468-1113 |
DOI: | 10.1016/j.comtox.2024.100324 |