Generating histograms of population data by scaling from sample data

Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fraser, Campbell Bryce, Jose, Ian, Zabback, Peter Alfred
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Fraser, Campbell Bryce
Jose, Ian
Zabback, Peter Alfred
description Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.
format Patent
fullrecord <record><control><sourceid>uspatents_EFH</sourceid><recordid>TN_cdi_uspatents_grants_08316009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>08316009</sourcerecordid><originalsourceid>FETCH-uspatents_grants_083160093</originalsourceid><addsrcrecordid>eNrjZHBxT81LLUosycxLV8jILC7JTy9KzC1WyE9TKMgvKM0BSuTnKaQkliQqJFUqFCcn5oAUphXl5yoUJ-YW5KSC5XgYWNMSc4pTeaE0N4OCm2uIs4duaXFBYklqXklxPNBYEGVgYWxoZmBgaUyEEgCOqzO6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Generating histograms of population data by scaling from sample data</title><source>USPTO Issued Patents</source><creator>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred</creator><creatorcontrib>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred ; Microsoft Corporation</creatorcontrib><description>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</description><language>eng</language><creationdate>2012</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8316009$$EPDF$$P50$$Guspatents$$Hfree_for_read</linktopdf><link.rule.ids>230,308,780,802,885,64039</link.rule.ids><linktorsrc>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8316009$$EView_record_in_USPTO$$FView_record_in_$$GUSPTO$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Fraser, Campbell Bryce</creatorcontrib><creatorcontrib>Jose, Ian</creatorcontrib><creatorcontrib>Zabback, Peter Alfred</creatorcontrib><creatorcontrib>Microsoft Corporation</creatorcontrib><title>Generating histograms of population data by scaling from sample data</title><description>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</description><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2012</creationdate><recordtype>patent</recordtype><sourceid>EFH</sourceid><recordid>eNrjZHBxT81LLUosycxLV8jILC7JTy9KzC1WyE9TKMgvKM0BSuTnKaQkliQqJFUqFCcn5oAUphXl5yoUJ-YW5KSC5XgYWNMSc4pTeaE0N4OCm2uIs4duaXFBYklqXklxPNBYEGVgYWxoZmBgaUyEEgCOqzO6</recordid><startdate>20121120</startdate><enddate>20121120</enddate><creator>Fraser, Campbell Bryce</creator><creator>Jose, Ian</creator><creator>Zabback, Peter Alfred</creator><scope>EFH</scope></search><sort><creationdate>20121120</creationdate><title>Generating histograms of population data by scaling from sample data</title><author>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-uspatents_grants_083160093</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2012</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Fraser, Campbell Bryce</creatorcontrib><creatorcontrib>Jose, Ian</creatorcontrib><creatorcontrib>Zabback, Peter Alfred</creatorcontrib><creatorcontrib>Microsoft Corporation</creatorcontrib><collection>USPTO Issued Patents</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fraser, Campbell Bryce</au><au>Jose, Ian</au><au>Zabback, Peter Alfred</au><aucorp>Microsoft Corporation</aucorp><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Generating histograms of population data by scaling from sample data</title><date>2012-11-20</date><risdate>2012</risdate><abstract>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_uspatents_grants_08316009
source USPTO Issued Patents
title Generating histograms of population data by scaling from sample data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T06%3A57%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-uspatents_EFH&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Fraser,%20Campbell%20Bryce&rft.aucorp=Microsoft%20Corporation&rft.date=2012-11-20&rft_id=info:doi/&rft_dat=%3Cuspatents_EFH%3E08316009%3C/uspatents_EFH%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true