Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks

The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Industrial & engineering chemistry research 2021-10, Vol.60 (39), p.14176-14184
Hauptverfasser: Sevgen, Emre, Kim, Edward, Folie, Brendan, Rivera, Ventura, Koeller, Jason, Rosenthal, Emily, Jacobs, Andrea, Ling, Julia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingredient starting from just a list of ingredients, is similarly challenging but is a key capability for staying up-to-date with industry competitors. Here, we take advantage of a large, curated formulations dataset from CAS, a division of the American Chemical Society, which offers a consistent and highly structured representation of the formulations and the chemical identities of their components to show that a variational autoencoder neural network learns meaningful representations of formulations in various product classes such as antiperspirants and oral care. Furthermore, it can be used in conjunction with a two-step sampling algorithm to generate accurate ingredient amount suggestions for deformulation. Deformulation using a variational autoencoder produces estimates that are significantly more accurate than nearest neighbor methods, extrapolates better to formulations that are significantly different than previously seen formulations, and provides a way to leverage large datasets for industrially relevant capabilities.
ISSN:0888-5885
1520-5045
DOI:10.1021/acs.iecr.1c00634