Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2017-01, Vol.5, p.24105-24119
Hauptverfasser:	She, Rui, Liu, Shanyun, Fan, Pingyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Amplification Anomaly detection Big Data big data analysis Clustering Data analysis discrete distribution estimation Divergence divergence estimation Entropy Estimation Information theory Kernel Message identification (M-I) divergence outlier detection Outliers (statistics) Parameters Probability distribution Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from O(Γ -j/d ) to O(Γ -1 ) (j ∈ (0, d]). We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2017.2768385