Generating histograms of population data by scaling from sample data

Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fraser, Campbell Bryce, Jose, Ian, Zabback, Peter Alfred
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.