Prediction of diffusion coefficients in aqueous systems by machine learning models

Currently there are no accurate models for the prediction of diffusion coefficients at infinite dilution in aqueous systems. Frequently, models that work well for polar solvents often perform worse in the case of water. At the same time, experimental data of tracer diffusion coefficients are scarce...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular liquids 2024-07, Vol.405, p.125009, Article 125009
Hauptverfasser: Aniceto, José P.S., Zêzere, Bruno, Silva, Carlos M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Currently there are no accurate models for the prediction of diffusion coefficients at infinite dilution in aqueous systems. Frequently, models that work well for polar solvents often perform worse in the case of water. At the same time, experimental data of tracer diffusion coefficients are scarce and can be impractical to measure when information on this important transport property is required. In this work, machine learning models were developed to predict the tracer diffusion coefficient of any solute in water at atmospheric pressure. Several approaches were carried out to construct the model, using different types of input parameters: pure component properties and theoretical molecular descriptors, such as atom counts, structural fragments and fingerprints, computed using different sources. A database of 126 systems (1192 data points) was used for training and the best model showed a global average absolute relative deviation (AARD) of 3.92%, with a maximum deviation of 24.27% on the test set. This model uses as inputs the temperature and 195 molecular descriptors computed using the RDKit cheminformatics package, which can be automatically calculated from a molecular identifier thus making the model very simple to use. In comparison, the well-known Wilke-Chang equation provided an AARD of 13.03% in the same test set, demonstrating the improved accuracy of the proposed solution. The models developed in this work are provided at github.com/EgiChem/ml-D12-water-app. •Machine learning models were developed to predict the binary diffusion coefficients of solutes in water.•Models were trained on a database of experimental data of 126 systems (1192 data points).•Different types of molecular descriptors were tested to construct the models.•All new models performed significantly better than the classic Wilke-Chang equation.•The best machine learning model presented an average deviation for the test set of 3.92%.
ISSN:0167-7322
1873-3166
DOI:10.1016/j.molliq.2024.125009