Identifying Excessively Rounded or Truncated Data

COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Knuth, Kevin H, Castle, J. Patrick, Wheeler, Kevin R
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Knuth, Kevin H
Castle, J. Patrick
Wheeler, Kevin R
description COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.
doi_str_mv 10.48550/arxiv.1602.04292
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1602_04292</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1602_04292</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-519d9738f845ae7e7bd360cbd2b93433888260287320d2ad62167f1ac5fc71563</originalsourceid><addsrcrecordid>eNotzs1qwkAYheHZuCjRC-jK3EDizDeZnywlaisIgmQfvsyPDGgikyjm7mttV4d3c3gI-WQ0L7QQdIXxGR45kxRyWkAJH4TtrevG4KfQndPt07hhCA93mdJTf--ss2kf0zreO4PjKzY44pzMPF4Gt_jfhNS7bV19Z4fj175aHzKUCjLBSlsqrr0uBDrlVGu5pKa10Ja84FxrDS-GVhyoBbQSmFSeoRHeKCYkT8jy7_Ztbm4xXDFOza-9edv5D0zaPU8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identifying Excessively Rounded or Truncated Data</title><source>arXiv.org</source><creator>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</creator><creatorcontrib>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</creatorcontrib><description>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</description><identifier>DOI: 10.48550/arxiv.1602.04292</identifier><language>eng</language><subject>Physics - Data Analysis, Statistics and Probability</subject><creationdate>2016-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1602.04292$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1602.04292$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Knuth, Kevin H</creatorcontrib><creatorcontrib>Castle, J. Patrick</creatorcontrib><creatorcontrib>Wheeler, Kevin R</creatorcontrib><title>Identifying Excessively Rounded or Truncated Data</title><description>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</description><subject>Physics - Data Analysis, Statistics and Probability</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qwkAYheHZuCjRC-jK3EDizDeZnywlaisIgmQfvsyPDGgikyjm7mttV4d3c3gI-WQ0L7QQdIXxGR45kxRyWkAJH4TtrevG4KfQndPt07hhCA93mdJTf--ss2kf0zreO4PjKzY44pzMPF4Gt_jfhNS7bV19Z4fj175aHzKUCjLBSlsqrr0uBDrlVGu5pKa10Ja84FxrDS-GVhyoBbQSmFSeoRHeKCYkT8jy7_Ztbm4xXDFOza-9edv5D0zaPU8</recordid><startdate>20160213</startdate><enddate>20160213</enddate><creator>Knuth, Kevin H</creator><creator>Castle, J. Patrick</creator><creator>Wheeler, Kevin R</creator><scope>GOX</scope></search><sort><creationdate>20160213</creationdate><title>Identifying Excessively Rounded or Truncated Data</title><author>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-519d9738f845ae7e7bd360cbd2b93433888260287320d2ad62167f1ac5fc71563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Physics - Data Analysis, Statistics and Probability</topic><toplevel>online_resources</toplevel><creatorcontrib>Knuth, Kevin H</creatorcontrib><creatorcontrib>Castle, J. Patrick</creatorcontrib><creatorcontrib>Wheeler, Kevin R</creatorcontrib><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Knuth, Kevin H</au><au>Castle, J. Patrick</au><au>Wheeler, Kevin R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying Excessively Rounded or Truncated Data</atitle><date>2016-02-13</date><risdate>2016</risdate><abstract>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</abstract><doi>10.48550/arxiv.1602.04292</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1602.04292
ispartof
issn
language eng
recordid cdi_arxiv_primary_1602_04292
source arXiv.org
subjects Physics - Data Analysis, Statistics and Probability
title Identifying Excessively Rounded or Truncated Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A23%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20Excessively%20Rounded%20or%20Truncated%20Data&rft.au=Knuth,%20Kevin%20H&rft.date=2016-02-13&rft_id=info:doi/10.48550/arxiv.1602.04292&rft_dat=%3Carxiv_GOX%3E1602_04292%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true