molecular solubility datasets

We collated small molecule solubility data from an array of databases and literature. Some of these sources merely provided molecular names, lacking SMILES notation. We sourced the molecular SMILES and molecular weights from PubChem, DrugBank, https://www.wikiarabic.org/, and https://www.sigmaaldric...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Fan, Ziyu Fan
Format:	Dataset
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We collated small molecule solubility data from an array of databases and literature. Some of these sources merely provided molecular names, lacking SMILES notation. We sourced the molecular SMILES and molecular weights from PubChem, DrugBank, https://www.wikiarabic.org/, and https://www.sigmaaldrich.com/US/en. In terms of data selection, Canonical SMILES was preferred over Isomeric SMILES in instances where both were available in different forms. As for data retention, we considered only experimental results gathered at temperatures between 20-25 degrees Celsius, and we standardized the data units to mol/L. The data cleaning process entailed several steps. We eliminated sections containing non-numerical data and other errors, as well as drug molecules without identifiable SMILES. We eradicated any duplicate data; Data representing the same molecule, either with identical SMILES or different SMILES formats, were removed. Additionally, data entries with solubility differences greater than 0.03 were eliminated. When the solubility difference was equal to or less than 0.03, the mean value was computed to provide the final solubility result. Ultimately, we excised molecules containing only two atoms, molecules with a molecular weight exceeding 500, and those with logS below -8.
DOI:	10.21227/21jn-7z98