Indexing and Self-indexing sequences of IEEE 754 double precision numbers

Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological colle...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2014-11, Vol.50 (6), p.857-875
Hauptverfasser:	FARINA, Antonio, ORDONEZ, Alberto, PARAMA, José R
Format:	Artikel
Sprache:	eng
Schlagworte:	Compact structures Data Electronic media Exact sciences and technology Indexes Indexing Information and communication sciences Information processing and retrieval Information retrieval systems. Information and document management system Information retrieval. Man machine relationship Information science Information science. Documentation Information storage Mathematical problems Real numbers Research process. Evaluation Sciences and techniques of general use Searches Storage Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	875
container_issue	6
container_start_page	857
container_title	Information processing & management
container_volume	50
creator	FARINA, Antonio ORDONEZ, Alberto PARAMA, José R
description	Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological collections or source code. However, there are no applications of succinct data structures in the case of floating point data, the obvious reason is that this data type does not usually fulfill the aforementioned requirements. In this work, we present four solutions to store and index floating point data that take advantage of the latest developments in succinct data structures. The first one is based on the well-known inverted index. It consumes space around the size of the source data, providing appealing search times. The other three solutions are based on self-indexing structures. The first one uses a binary Huffman-shaped wavelet tree. It is never the winner in our experiments, but still yields a good balance between space and search performance. The second one is based on wavelet trees on bytecodes, and obtains the best space/time trade-off in most scenarios. The last one is based on Sadakane’s Compressed Suffix Array. It excels in space at the expense of less performance at searches. Including a representation of the original data, our indexes occupy from around 70% to 115% of the size of the original collection, and permit fast indexed searches within it.
doi_str_mv	10.1016/j.ipm.2014.07.002
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1680141209</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306457314000545</els_id><sourcerecordid>3430001811</sourcerecordid><originalsourceid>FETCH-LOGICAL-c431t-6f28e839581d617cb9077a6612088285c8d976419913c1f67c764222646e883f3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AG8FEby0ZvJdPIlULSx4UM-hm6aS0k3XZCv6782yqwcPnoYZnnlneBA6B1wABnHdF269KggGVmBZYEwO0AyUpDmnEg7RDFMscsYlPUYnMfYYY8aBzFBd-9Z-Ov-WNb7Nnu3Q5e5nEu37ZL2xMRu7rK6qKpOcZe04LQebrYM1LrrRZ35aLW2Ip-ioa4Zoz_Z1jl7vq5e7x3zx9FDf3S5ywyhsctERZRUtuYJWgDTLEkvZCAEEK0UUN6otpWBQlkANdEKa1BFCBBNWKdrRObra5a7DmP6LG71y0dhhaLwdp6hBqGQhxZUJvfiD9uMUfPpOAxfAFAjBEwU7yoQxxmA7vQ5u1YQvDVhv5epeJ7l6K1djqZPctHO5T26iaYYuND7Z-F0kSkrFGSTuZsfZZOTD2aCjcVunrUv-Nrod3T9XvgF_zoqh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1561481665</pqid></control><display><type>article</type><title>Indexing and Self-indexing sequences of IEEE 754 double precision numbers</title><source>Access via ScienceDirect (Elsevier)</source><creator>FARINA, Antonio ; ORDONEZ, Alberto ; PARAMA, José R</creator><creatorcontrib>FARINA, Antonio ; ORDONEZ, Alberto ; PARAMA, José R</creatorcontrib><description>Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological collections or source code. However, there are no applications of succinct data structures in the case of floating point data, the obvious reason is that this data type does not usually fulfill the aforementioned requirements. In this work, we present four solutions to store and index floating point data that take advantage of the latest developments in succinct data structures. The first one is based on the well-known inverted index. It consumes space around the size of the source data, providing appealing search times. The other three solutions are based on self-indexing structures. The first one uses a binary Huffman-shaped wavelet tree. It is never the winner in our experiments, but still yields a good balance between space and search performance. The second one is based on wavelet trees on bytecodes, and obtains the best space/time trade-off in most scenarios. The last one is based on Sadakane’s Compressed Suffix Array. It excels in space at the expense of less performance at searches. Including a representation of the original data, our indexes occupy from around 70% to 115% of the size of the original collection, and permit fast indexed searches within it.</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/j.ipm.2014.07.002</identifier><identifier>CODEN: IPMADK</identifier><language>eng</language><publisher>Kidlington: Elsevier Ltd</publisher><subject>Compact structures ; Data ; Electronic media ; Exact sciences and technology ; Indexes ; Indexing ; Information and communication sciences ; Information processing and retrieval ; Information retrieval systems. Information and document management system ; Information retrieval. Man machine relationship ; Information science ; Information science. Documentation ; Information storage ; Mathematical problems ; Real numbers ; Research process. Evaluation ; Sciences and techniques of general use ; Searches ; Storage ; Studies</subject><ispartof>Information processing & management, 2014-11, Vol.50 (6), p.857-875</ispartof><rights>2014 Elsevier Ltd</rights><rights>2015 INIST-CNRS</rights><rights>Copyright Pergamon Press Inc. Nov 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c431t-6f28e839581d617cb9077a6612088285c8d976419913c1f67c764222646e883f3</citedby><cites>FETCH-LOGICAL-c431t-6f28e839581d617cb9077a6612088285c8d976419913c1f67c764222646e883f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ipm.2014.07.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28778541$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>FARINA, Antonio</creatorcontrib><creatorcontrib>ORDONEZ, Alberto</creatorcontrib><creatorcontrib>PARAMA, José R</creatorcontrib><title>Indexing and Self-indexing sequences of IEEE 754 double precision numbers</title><title>Information processing & management</title><description>Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological collections or source code. However, there are no applications of succinct data structures in the case of floating point data, the obvious reason is that this data type does not usually fulfill the aforementioned requirements. In this work, we present four solutions to store and index floating point data that take advantage of the latest developments in succinct data structures. The first one is based on the well-known inverted index. It consumes space around the size of the source data, providing appealing search times. The other three solutions are based on self-indexing structures. The first one uses a binary Huffman-shaped wavelet tree. It is never the winner in our experiments, but still yields a good balance between space and search performance. The second one is based on wavelet trees on bytecodes, and obtains the best space/time trade-off in most scenarios. The last one is based on Sadakane’s Compressed Suffix Array. It excels in space at the expense of less performance at searches. Including a representation of the original data, our indexes occupy from around 70% to 115% of the size of the original collection, and permit fast indexed searches within it.</description><subject>Compact structures</subject><subject>Data</subject><subject>Electronic media</subject><subject>Exact sciences and technology</subject><subject>Indexes</subject><subject>Indexing</subject><subject>Information and communication sciences</subject><subject>Information processing and retrieval</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information retrieval. Man machine relationship</subject><subject>Information science</subject><subject>Information science. Documentation</subject><subject>Information storage</subject><subject>Mathematical problems</subject><subject>Real numbers</subject><subject>Research process. Evaluation</subject><subject>Sciences and techniques of general use</subject><subject>Searches</subject><subject>Storage</subject><subject>Studies</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AG8FEby0ZvJdPIlULSx4UM-hm6aS0k3XZCv6782yqwcPnoYZnnlneBA6B1wABnHdF269KggGVmBZYEwO0AyUpDmnEg7RDFMscsYlPUYnMfYYY8aBzFBd-9Z-Ov-WNb7Nnu3Q5e5nEu37ZL2xMRu7rK6qKpOcZe04LQebrYM1LrrRZ35aLW2Ip-ioa4Zoz_Z1jl7vq5e7x3zx9FDf3S5ywyhsctERZRUtuYJWgDTLEkvZCAEEK0UUN6otpWBQlkANdEKa1BFCBBNWKdrRObra5a7DmP6LG71y0dhhaLwdp6hBqGQhxZUJvfiD9uMUfPpOAxfAFAjBEwU7yoQxxmA7vQ5u1YQvDVhv5epeJ7l6K1djqZPctHO5T26iaYYuND7Z-F0kSkrFGSTuZsfZZOTD2aCjcVunrUv-Nrod3T9XvgF_zoqh</recordid><startdate>20141101</startdate><enddate>20141101</enddate><creator>FARINA, Antonio</creator><creator>ORDONEZ, Alberto</creator><creator>PARAMA, José R</creator><general>Elsevier Ltd</general><general>Elsevier</general><general>Elsevier Science Ltd</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><scope>8BP</scope></search><sort><creationdate>20141101</creationdate><title>Indexing and Self-indexing sequences of IEEE 754 double precision numbers</title><author>FARINA, Antonio ; ORDONEZ, Alberto ; PARAMA, José R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c431t-6f28e839581d617cb9077a6612088285c8d976419913c1f67c764222646e883f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Compact structures</topic><topic>Data</topic><topic>Electronic media</topic><topic>Exact sciences and technology</topic><topic>Indexes</topic><topic>Indexing</topic><topic>Information and communication sciences</topic><topic>Information processing and retrieval</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information retrieval. Man machine relationship</topic><topic>Information science</topic><topic>Information science. Documentation</topic><topic>Information storage</topic><topic>Mathematical problems</topic><topic>Real numbers</topic><topic>Research process. Evaluation</topic><topic>Sciences and techniques of general use</topic><topic>Searches</topic><topic>Storage</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>FARINA, Antonio</creatorcontrib><creatorcontrib>ORDONEZ, Alberto</creatorcontrib><creatorcontrib>PARAMA, José R</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Library & Information Sciences Abstracts (LISA) - CILIP Edition</collection><jtitle>Information processing & management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>FARINA, Antonio</au><au>ORDONEZ, Alberto</au><au>PARAMA, José R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Indexing and Self-indexing sequences of IEEE 754 double precision numbers</atitle><jtitle>Information processing & management</jtitle><date>2014-11-01</date><risdate>2014</risdate><volume>50</volume><issue>6</issue><spage>857</spage><epage>875</epage><pages>857-875</pages><issn>0306-4573</issn><eissn>1873-5371</eissn><coden>IPMADK</coden><abstract>Succinct data structures were designed to store and/or index data with a relatively small alphabet size, a rather skewed distribution and/or, a considerable amount of repetitiveness. Although many of them were developed to handle text, they have been used with other data types, like biological collections or source code. However, there are no applications of succinct data structures in the case of floating point data, the obvious reason is that this data type does not usually fulfill the aforementioned requirements. In this work, we present four solutions to store and index floating point data that take advantage of the latest developments in succinct data structures. The first one is based on the well-known inverted index. It consumes space around the size of the source data, providing appealing search times. The other three solutions are based on self-indexing structures. The first one uses a binary Huffman-shaped wavelet tree. It is never the winner in our experiments, but still yields a good balance between space and search performance. The second one is based on wavelet trees on bytecodes, and obtains the best space/time trade-off in most scenarios. The last one is based on Sadakane’s Compressed Suffix Array. It excels in space at the expense of less performance at searches. Including a representation of the original data, our indexes occupy from around 70% to 115% of the size of the original collection, and permit fast indexed searches within it.</abstract><cop>Kidlington</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.ipm.2014.07.002</doi><tpages>19</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0306-4573
ispartof	Information processing & management, 2014-11, Vol.50 (6), p.857-875
issn	0306-4573 1873-5371
language	eng
recordid	cdi_proquest_miscellaneous_1680141209
source	Access via ScienceDirect (Elsevier)
subjects	Compact structures Data Electronic media Exact sciences and technology Indexes Indexing Information and communication sciences Information processing and retrieval Information retrieval systems. Information and document management system Information retrieval. Man machine relationship Information science Information science. Documentation Information storage Mathematical problems Real numbers Research process. Evaluation Sciences and techniques of general use Searches Storage Studies
title	Indexing and Self-indexing sequences of IEEE 754 double precision numbers
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T09%3A22%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Indexing%20and%20Self-indexing%20sequences%20of%20IEEE%20754%20double%20precision%20numbers&rft.jtitle=Information%20processing%20&%20management&rft.au=FARINA,%20Antonio&rft.date=2014-11-01&rft.volume=50&rft.issue=6&rft.spage=857&rft.epage=875&rft.pages=857-875&rft.issn=0306-4573&rft.eissn=1873-5371&rft.coden=IPMADK&rft_id=info:doi/10.1016/j.ipm.2014.07.002&rft_dat=%3Cproquest_cross%3E3430001811%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1561481665&rft_id=info:pmid/&rft_els_id=S0306457314000545&rfr_iscdi=true