Self-Supervised Graph Information Bottleneck for Multi-View Molecular Embedding Learning

In the field of computer-aided drug discovery (CADD), identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on artificial intelligence 2023-07, p.1-9
Hauptverfasser:	Li, Changsheng, Mao, Kaihang, Wang, Shiye, Yuan, Ye, Wang, Guoren
Format:	Artikel
Sprache:	eng
Schlagworte:	Chemicals Feature extraction Geometry Information bottleneck molecular embedding learning molecular property prediction multi-view learning Self-supervised learning Solid modeling Task analysis Three-dimensional displays
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the field of computer-aided drug discovery (CADD), identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often expensive and time-consuming, making it challenging to train molecular encoders with limited supervision. Additionally, molecules can be represented in two ways: as 2D chemical bond structures and 3D geometry structures. Molecular embedding learning using only one of these representations can result in information loss, and effective fusion of the two views has not been fully explored. To address these challenges, we propose a new approach called the Self-supervised Multi-View Graph Neural Network (SMV-GNN) for molecular embedding learning. Our approach involves a self-supervised task that promotes the representation ability of the molecular encoder without requiring extra human-annotation data. Specifically, we use chemical-bond based graph structures as inputs to predict inter-atom distances from the 2D view and randomly shuffle a ratio of atoms in the 3D-coordinate based graphs to predict atom rationality from the 3D view. We further improve the representation ability of the molecular embedding by using information bottleneck to learn essential shared feature representations by discarding superfluous information from the 2D/3D views for downstream tasks. We evaluate our proposed SMV-GNN approach on seven benchmark datasets for molecule property prediction tasks and demonstrate that it outperforms the current state-of-the-art methods. The source code is available at: https://github.com/myuanxiao/SMVGNN .
ISSN:	2691-4581
DOI:	10.1109/TAI.2023.3297576