Federated learning of molecular properties with graph neural networks in a heterogeneous setting

Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Fe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Patterns (New York, N.Y.) N.Y.), 2022-06, Vol.3 (6), p.100521-100521, Article 100521
Hauptverfasser: Zhu, Wei, Luo, Jiebo, White, Andrew D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data. [Display omitted] •FedChem employs scaffold splitting and LDA for heterogeneous settings•We propose FLIT(+) algorithms to alleviate the heterogeneity problem•We conduct experiments to benchmark the proposed and existing methods on FedChem Generating datasets with thousands of molecules for machine learning in chemistry is cost prohibitive due to the high material and/or computational costs. Additionally, chemical data’s intrinsic value makes institutions reluctant to contribute to a centralized dataset. Recent studies suggest that deep learning has the potential to accelerate molecule discovery, but there are few large datasets for chemistry. Instead, individual institutions gather their data privately, which leads to under-trained models with poor generalization performance. Even worse, the local models can be biased because institutions often focus on certain regions of chemical space important for their interests and expertise. We propose a federated-learning method with graph neural networks that can treat this heterogeneity and enable accurate federated learning on molecular-property prediction. We propose a heterogeneous federated-learning benchmark and show that our method is state of the art. This work presented a federated heterogeneous molecular learning benchmark based on MoleculeNet as FedChem. Several federated-learning methods are benchmarked on the pr
ISSN:2666-3899
2666-3899
DOI:10.1016/j.patter.2022.100521