Generating histograms of population data by scaling from sample data
Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Fraser, Campbell Bryce Jose, Ian Zabback, Peter Alfred |
description | Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population. |
format | Patent |
fullrecord | <record><control><sourceid>uspatents_EFH</sourceid><recordid>TN_cdi_uspatents_grants_08316009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>08316009</sourcerecordid><originalsourceid>FETCH-uspatents_grants_083160093</originalsourceid><addsrcrecordid>eNrjZHBxT81LLUosycxLV8jILC7JTy9KzC1WyE9TKMgvKM0BSuTnKaQkliQqJFUqFCcn5oAUphXl5yoUJ-YW5KSC5XgYWNMSc4pTeaE0N4OCm2uIs4duaXFBYklqXklxPNBYEGVgYWxoZmBgaUyEEgCOqzO6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Generating histograms of population data by scaling from sample data</title><source>USPTO Issued Patents</source><creator>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred</creator><creatorcontrib>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred ; Microsoft Corporation</creatorcontrib><description>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</description><language>eng</language><creationdate>2012</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8316009$$EPDF$$P50$$Guspatents$$Hfree_for_read</linktopdf><link.rule.ids>230,308,780,802,885,64039</link.rule.ids><linktorsrc>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8316009$$EView_record_in_USPTO$$FView_record_in_$$GUSPTO$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Fraser, Campbell Bryce</creatorcontrib><creatorcontrib>Jose, Ian</creatorcontrib><creatorcontrib>Zabback, Peter Alfred</creatorcontrib><creatorcontrib>Microsoft Corporation</creatorcontrib><title>Generating histograms of population data by scaling from sample data</title><description>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</description><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2012</creationdate><recordtype>patent</recordtype><sourceid>EFH</sourceid><recordid>eNrjZHBxT81LLUosycxLV8jILC7JTy9KzC1WyE9TKMgvKM0BSuTnKaQkliQqJFUqFCcn5oAUphXl5yoUJ-YW5KSC5XgYWNMSc4pTeaE0N4OCm2uIs4duaXFBYklqXklxPNBYEGVgYWxoZmBgaUyEEgCOqzO6</recordid><startdate>20121120</startdate><enddate>20121120</enddate><creator>Fraser, Campbell Bryce</creator><creator>Jose, Ian</creator><creator>Zabback, Peter Alfred</creator><scope>EFH</scope></search><sort><creationdate>20121120</creationdate><title>Generating histograms of population data by scaling from sample data</title><author>Fraser, Campbell Bryce ; Jose, Ian ; Zabback, Peter Alfred</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-uspatents_grants_083160093</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2012</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Fraser, Campbell Bryce</creatorcontrib><creatorcontrib>Jose, Ian</creatorcontrib><creatorcontrib>Zabback, Peter Alfred</creatorcontrib><creatorcontrib>Microsoft Corporation</creatorcontrib><collection>USPTO Issued Patents</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fraser, Campbell Bryce</au><au>Jose, Ian</au><au>Zabback, Peter Alfred</au><aucorp>Microsoft Corporation</aucorp><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Generating histograms of population data by scaling from sample data</title><date>2012-11-20</date><risdate>2012</risdate><abstract>Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_uspatents_grants_08316009 |
source | USPTO Issued Patents |
title | Generating histograms of population data by scaling from sample data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T06%3A57%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-uspatents_EFH&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Fraser,%20Campbell%20Bryce&rft.aucorp=Microsoft%20Corporation&rft.date=2012-11-20&rft_id=info:doi/&rft_dat=%3Cuspatents_EFH%3E08316009%3C/uspatents_EFH%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |