Statistic significant feature importance regressor based estimation of compound aqueous solubility

The characteristics of chemicals is more important for finding the solubility nature of various molecules in the compound. The chemicals can be used for various applications after validating the aqueous solubility nature of the compound. With these aspects, machine learning can be used for finding t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Devi, M. Shyamala, Aruna, R., Pravallika, T. Puja, Balaji, Chella, Kumar, P. Santhosh
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The characteristics of chemicals is more important for finding the solubility nature of various molecules in the compound. The chemicals can be used for various applications after validating the aqueous solubility nature of the compound. With these aspects, machine learning can be used for finding the nature of the compound and this paper uses the aqueous solubility dataset with 21 features ad 9982 compound information to predict the aqueous solubility nature of the compound. The aqueous Solubility dataset is applied to Anova test Analysis. The P-value of the F-statistic value of the feature Group is found to have “0.935708” which is greater than 0.05. So the feature “Group’ is removed from the dataset which do not influence the target “Aqueous Solubility”. The anova-reduced dataset is applied to ensemble regressors like Ada boost, Random Forest, Gradient Boost and Extra Trees Regressor to extract the feature importance of the Aqueous Solubility dataset. Then the feature importance ensemble regressor reduced dataset is applied to the regressors to analyze the performance indices. The performance of algorithm is analyzed with intercept, EVS, MAE, MSE and RScore. Experimental results shows that the ElasticNet regressor with extra tree regressor reduced dataset found to have the RScore of 0.9715, which is approximately close to 1.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0154392