On the accuracy of linear regression routines in some data mining packages
While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packa...
Gespeichert in:
Veröffentlicht in: | Wiley interdisciplinary reviews. Data mining and knowledge discovery 2019-05, Vol.9 (3), p.e1279-n/a |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted.
This article is categorized under:
Technologies > Statistical Fundamentals
Algorithmic Development > Statistics
Application Areas > Data Mining Software Tools
Accuracy of Packages for Big Longley Regressions. |
---|---|
ISSN: | 1942-4787 1942-4795 |
DOI: | 10.1002/widm.1279 |