A pre-trained multi-representation fusion network for molecular property prediction

In the field of machine learning and cheminformatics, the prediction of molecular properties holds significant importance. Molecules can be represented in various formats, including 1D SMILES string, 2D graph, and 3D conformation. Numerous models have been proposed for different representations to a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information fusion 2024-03, Vol.103, p.102092, Article 102092
Hauptverfasser: Zhang, Haohui, Wu, Juntong, Liu, Shichao, Han, Shen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the field of machine learning and cheminformatics, the prediction of molecular properties holds significant importance. Molecules can be represented in various formats, including 1D SMILES string, 2D graph, and 3D conformation. Numerous models have been proposed for different representations to accomplish molecular property prediction. However, most recent works have focused on one or two representations or combining embedding vectors from different perspectives in an unsophisticated manner. To address this issue, we present PremuNet, a novel pre-trained multi-representation fusion network for molecular property prediction. PremuNet can extract comprehensive molecular information from multiple views and combine them interactively through pre-training and fine-tuning. The framework of PremuNet consists of two branches: a Transformer-GNN branch that extracts SMILES and graph information, and a Fusion Net branch that extracts topology and geometry information, called PremuNet-L and PremuNet-H respectively. We employ masked self-supervised methods to enable the model to learn information fusion and achieve enhanced performance in downstream tasks. The proposed model has been evaluated on eight molecular property prediction tasks, including five classification and three regression tasks, and attained state-of-the-art performance in most cases. Additionally, we conduct the ablation studies to demonstrate the effect of each view and the branch combination approaches. •A novel molecular property prediction framework with multi-model fusion network.•We introduce new information fusion schemes with graph neural network structure.•Our model outperforms the baselines in eight molecular property prediction tasks.
ISSN:1566-2535
1872-6305
DOI:10.1016/j.inffus.2023.102092