Data chaos: An entropy based MapReduce framework for scalable learning
Chaos of data is the total unpredictability of all the data elements, and can by quantified by Shannon entropy. In this paper, we firstly propose an entropy based theoretic framework for machine learning, which states that chaos in sample data will decrease and rule will advance as learning progress...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Chaos of data is the total unpredictability of all the data elements, and can by quantified by Shannon entropy. In this paper, we firstly propose an entropy based theoretic framework for machine learning, which states that chaos in sample data will decrease and rule will advance as learning progresses. However, it is usually time consuming to apply the theoretic framework because groups of rule need to be trained iteratively and data chaos will be recalculated during each iteration. To implement the theoretic framework for scalable learning, we propose a MapReduce based distributed computational framework. In a case study of classification, the framework parallelly trains multiple classifiers and calculats chaos of the sample set during each iteration, and then resamples a small sample subset with the highest entropy for training of the next iteration, reducing chaos in sample data as quickly as possible. With typical classification benchmarks, our experiment presents entropy in sample data, and proves that the theoretic framework is rational and can help improve the accuracy of machine learning. Meanwhile, the computational framework shows high performance including high efficiency and scalability for large scale learning on hadoop cluster. |
---|---|
DOI: | 10.1109/BigData.2013.6691736 |