Prediction of human pharmacokinetic parameters incorporating SMILES information

This study aimed to develop a model incorporating natural language processing analysis for the simplified molecular-input line-entry system (SMILES) to predict clearance (CL) and volume of distribution at steady state (V d,ss ) in humans. The construction of CL and V d,ss prediction models involved...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Archives of pharmacal research 2024, 47(12), , pp.914-923
Hauptverfasser: Kwon, Jae-Hee, Han, Ja-Young, Kim, Minjung, Kim, Seong Kyung, Lee, Dong-Kyu, Kim, Myeong Gyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study aimed to develop a model incorporating natural language processing analysis for the simplified molecular-input line-entry system (SMILES) to predict clearance (CL) and volume of distribution at steady state (V d,ss ) in humans. The construction of CL and V d,ss prediction models involved data from 435 to 439 compounds, respectively. In machine learning, features such as animal pharmacokinetic data, in vitro experimental data, molecular descriptors, and SMILES were utilized, with XGBoost employed as the algorithm. The ChemBERTa model was used to analyze substance SMILES, and the last hidden layer embedding of ChemBERTa was examined as a feature. The model was evaluated using geometric mean fold error (GMFE), r 2 , root mean squared error (RMSE), and accuracy within 2- and 3-fold error. The model demonstrated optimal performance for CL prediction when incorporating animal pharmacokinetic data, in vitro experimental data, and SMILES as features, yielding a GMFE of 1.768, an r 2 of 0.528, an RMSE of 0.788, with accuracies within 2-fold and 3-fold error reaching 75.8% and 81.8%, respectively. The model's performance in V d,ss prediction was optimized by leveraging animal pharmacokinetic data and in vitro experimental data as features, yielding a GMFE of 1.401, an r 2 of 0.902, an RMSE of 0.413, with accuracies within 2-fold and 3-fold error reaching 93.8% and 100%, respectively. This study has developed a highly predictive model for CL and V d,ss . Specifically, incorporating SMILES information into the model has predictive power for CL.
ISSN:0253-6269
1976-3786
1976-3786
DOI:10.1007/s12272-024-01520-2