HiMolformer: Integrating graph and sequence representations for predicting liver microsome stability with SMILES

In the initial stages of drug discovery or pre-clinical studies, understanding the metabolic stability of new molecules is crucial. Recently, research on pre-trained deep learning for molecular property prediction has been actively progressing, with various models being made open-source. However, mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational biology and chemistry 2024-12, Vol.113, p.108263, Article 108263
Hauptverfasser: Yun, Seokwoo, Nam, Gibeom, Koo, Jahwan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the initial stages of drug discovery or pre-clinical studies, understanding the metabolic stability of new molecules is crucial. Recently, research on pre-trained deep learning for molecular property prediction has been actively progressing, with various models being made open-source. However, most of these models rely on either 2D graph or 1D sequence for training, and the representation varies depending on the data format used. Consequently, combining multiple representations can broaden the scope of learning and may potentially be a manageable and most effective method to enhance performance. Therefore, we propose a novel hybrid model for predicting metabolic stability, which integrates representations from both graph-based and sequence-based models pre-trained for molecular features. This approach utilizes the combined strengths of 2D topological and 1D sequential information of molecules. HiMol, a graph-based graph neural network (GNN) model, and Molformer, a sequence-based Transformer model, were selected for integration, thus we named it HiMolformer. HiMolformer demonstrated superior performance compared to other models. We also focus on regression task for prediction with a empirical dataset from Korea Chemical Bank (KCB), comprising 3,498 molecules with mouse liver microsome (MLM) and human liver microsome (HLM) data obtained from actual metabolic reaction experiments. To the best of our knowledge, it is the first attempt to develop MLM and HLM prediction models using regression with a single SMILES input. The source code of this model is available at https://github.com/YUNSEOKWOO/HiMolformer. [Display omitted] •Introduce HiMolformer, integrating 2D graph and 1D sequence data for molecular analysis.•Combines GNN-based HiMol and Transformer-based Molformer architectures.•Enhances analysis of molecular topological and sequential information.•Improves liver microsome stability prediction accuracy.•Utilizes real liver metabolism data from Korea Chemical Bank (KCB).
ISSN:1476-9271
1476-928X
1476-928X
DOI:10.1016/j.compbiolchem.2024.108263