Identifying Excessively Rounded or Truncated Data

COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Knuth, Kevin H, Castle, J. Patrick, Wheeler, Kevin R
Format:	Artikel
Sprache:	eng
Schlagworte:	Physics - Data Analysis, Statistics and Probability
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Knuth, Kevin H Castle, J. Patrick Wheeler, Kevin R
description	COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.
doi_str_mv	10.48550/arxiv.1602.04292
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1602_04292</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1602_04292</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-519d9738f845ae7e7bd360cbd2b93433888260287320d2ad62167f1ac5fc71563</originalsourceid><addsrcrecordid>eNotzs1qwkAYheHZuCjRC-jK3EDizDeZnywlaisIgmQfvsyPDGgikyjm7mttV4d3c3gI-WQ0L7QQdIXxGR45kxRyWkAJH4TtrevG4KfQndPt07hhCA93mdJTf--ss2kf0zreO4PjKzY44pzMPF4Gt_jfhNS7bV19Z4fj175aHzKUCjLBSlsqrr0uBDrlVGu5pKa10Ja84FxrDS-GVhyoBbQSmFSeoRHeKCYkT8jy7_Ztbm4xXDFOza-9edv5D0zaPU8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identifying Excessively Rounded or Truncated Data</title><source>arXiv.org</source><creator>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</creator><creatorcontrib>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</creatorcontrib><description>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</description><identifier>DOI: 10.48550/arxiv.1602.04292</identifier><language>eng</language><subject>Physics - Data Analysis, Statistics and Probability</subject><creationdate>2016-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1602.04292$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1602.04292$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Knuth, Kevin H</creatorcontrib><creatorcontrib>Castle, J. Patrick</creatorcontrib><creatorcontrib>Wheeler, Kevin R</creatorcontrib><title>Identifying Excessively Rounded or Truncated Data</title><description>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</description><subject>Physics - Data Analysis, Statistics and Probability</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qwkAYheHZuCjRC-jK3EDizDeZnywlaisIgmQfvsyPDGgikyjm7mttV4d3c3gI-WQ0L7QQdIXxGR45kxRyWkAJH4TtrevG4KfQndPt07hhCA93mdJTf--ss2kf0zreO4PjKzY44pzMPF4Gt_jfhNS7bV19Z4fj175aHzKUCjLBSlsqrr0uBDrlVGu5pKa10Ja84FxrDS-GVhyoBbQSmFSeoRHeKCYkT8jy7_Ztbm4xXDFOza-9edv5D0zaPU8</recordid><startdate>20160213</startdate><enddate>20160213</enddate><creator>Knuth, Kevin H</creator><creator>Castle, J. Patrick</creator><creator>Wheeler, Kevin R</creator><scope>GOX</scope></search><sort><creationdate>20160213</creationdate><title>Identifying Excessively Rounded or Truncated Data</title><author>Knuth, Kevin H ; Castle, J. Patrick ; Wheeler, Kevin R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-519d9738f845ae7e7bd360cbd2b93433888260287320d2ad62167f1ac5fc71563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Physics - Data Analysis, Statistics and Probability</topic><toplevel>online_resources</toplevel><creatorcontrib>Knuth, Kevin H</creatorcontrib><creatorcontrib>Castle, J. Patrick</creatorcontrib><creatorcontrib>Wheeler, Kevin R</creatorcontrib><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Knuth, Kevin H</au><au>Castle, J. Patrick</au><au>Wheeler, Kevin R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying Excessively Rounded or Truncated Data</atitle><date>2016-02-13</date><risdate>2016</risdate><abstract>COMPSTAT 2006, Rome Italy, Physica-Verlag, Heidelberg, pp. 313-324, 2006 All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the data set itself.</abstract><doi>10.48550/arxiv.1602.04292</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1602.04292
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1602_04292
source	arXiv.org
subjects	Physics - Data Analysis, Statistics and Probability
title	Identifying Excessively Rounded or Truncated Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A23%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20Excessively%20Rounded%20or%20Truncated%20Data&rft.au=Knuth,%20Kevin%20H&rft.date=2016-02-13&rft_id=info:doi/10.48550/arxiv.1602.04292&rft_dat=%3Carxiv_GOX%3E1602_04292%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true