comparative study of machine learning algorithms applied to predictive toxicology data mining

This paper reports results of a comparative study of widely used machine learning algorithms applied to predictive toxicology data mining. The machine learning algorithms involved were chosen in terms of their representability and diversity, and were extensively evaluated with seven toxicity data se...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Alternatives to laboratory animals 2007-03, Vol.35 (1), p.25-32
Hauptverfasser:	Neagu, D.C, Guo, G, Trundle, P.R, Cronin, M.T.D
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Animals Artificial Intelligence Bees chemical substances computer analysis Daphnia data analysis Data Interpretation, Statistical Databases, Factual Phenols - toxicity Predictive Value of Tests Quail Reproducibility of Results toxicity Toxicity Tests - methods Trout
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper reports results of a comparative study of widely used machine learning algorithms applied to predictive toxicology data mining. The machine learning algorithms involved were chosen in terms of their representability and diversity, and were extensively evaluated with seven toxicity data sets which were taken from real-world applications. Some results based on visual analysis of the correlations of different descriptors to the class values of chemical compounds, and on the relationships of the range of chosen descriptors to the performance of machine learning algorithms, are emphasised from our experiments. Some interesting findings relating to the data and the quality of the models are presented - for example, that no specific algorithm appears best for all seven toxicity data sets, and that up to five descriptors are sufficient for creating classification models for each toxicity data set with good accuracy. We suggest that, for a specific data set, model accuracy is affected by the feature selection method and model development technique. Models built with too many or too few descriptors are undesirable, and finding the optimal feature subset appears at least as important as selecting appropriate algorithms with which to build a final model.
ISSN:	0261-1929 2632-3559
DOI:	10.1177/026119290703500119