Position-Restricted Substring Searching

A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. I...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mäkinen, Veli, Navarro, Gonzalo
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Compressible Text Computer science control theory systems Exact sciences and technology Information systems. Data bases Left Child Memory organisation. Data processing Rank Query Select Query Software Theoretical computing Wavelet Tree
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	714
container_issue
container_start_page	703
container_title
container_volume
creator	Mäkinen, Veli Navarro, Gonzalo
description	A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occl,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occl,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\rm 1+{\it \epsilon}}$\end{document}n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible. Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries. As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.
doi_str_mv	10.1007/11682462_64
format	Book Chapter
fullrecord	<record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_19689118</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>19689118</sourcerecordid><originalsourceid>FETCH-LOGICAL-p256t-5e2c67946c9e9ed29065d9c268081e693a640baaded763d450547e1fa2ae4aa83</originalsourceid><addsrcrecordid>eNpVUEtLxDAYjC-wrD35B_Yi4qGaL48vyVGW9QELiqvn8DVNtbq2pakH_71b1oPOZQZmGIZh7BT4JXBurgDQCoXCo9pjuTNWasWlMBphn2WAAIWUyh3887Q6ZBmXXBTOKHnM8pTe-RYSLAqbsfPHLjVj07XFU0zj0IQxVvP1Vznp9nW-jjSEt606YUc1bVLMf3nGXm6Wz4u7YvVwe7-4XhW90DgWOoqAxikMLrpYCcdRVy4ItNxCRCcJFS-JqlgZlJXSXCsToSZBURFZOWNnu96eUqBNPVAbmuT7ofmk4duDQ-sAptzFLpf6aWgcfNl1H8kD99NZ_s9Z8gdQ11UE</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype></control><display><type>book_chapter</type><title>Position-Restricted Substring Searching</title><source>Springer Books</source><creator>Mäkinen, Veli ; Navarro, Gonzalo</creator><contributor>Kiwi, Marcos ; Correa, José R. ; Hevia, Alejandro</contributor><creatorcontrib>Mäkinen, Veli ; Navarro, Gonzalo ; Kiwi, Marcos ; Correa, José R. ; Hevia, Alejandro</creatorcontrib><description>A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occl,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occl,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\rm 1+{\it \epsilon}}$\end{document}n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible. Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries. As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540327554</identifier><identifier>ISBN: 354032755X</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540327561</identifier><identifier>EISBN: 3540327568</identifier><identifier>DOI: 10.1007/11682462_64</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Compressible Text ; Computer science; control theory; systems ; Exact sciences and technology ; Information systems. Data bases ; Left Child ; Memory organisation. Data processing ; Rank Query ; Select Query ; Software ; Theoretical computing ; Wavelet Tree</subject><ispartof>LATIN 2006: Theoretical Informatics, 2006, p.703-714</ispartof><rights>Springer-Verlag Berlin Heidelberg 2006</rights><rights>2007 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11682462_64$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11682462_64$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,775,776,780,785,786,789,4036,4037,27902,38232,41418,42487</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=19689118$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Kiwi, Marcos</contributor><contributor>Correa, José R.</contributor><contributor>Hevia, Alejandro</contributor><creatorcontrib>Mäkinen, Veli</creatorcontrib><creatorcontrib>Navarro, Gonzalo</creatorcontrib><title>Position-Restricted Substring Searching</title><title>LATIN 2006: Theoretical Informatics</title><description>A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occl,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occl,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\rm 1+{\it \epsilon}}$\end{document}n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible. Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries. As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.</description><subject>Applied sciences</subject><subject>Compressible Text</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Information systems. Data bases</subject><subject>Left Child</subject><subject>Memory organisation. Data processing</subject><subject>Rank Query</subject><subject>Select Query</subject><subject>Software</subject><subject>Theoretical computing</subject><subject>Wavelet Tree</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540327554</isbn><isbn>354032755X</isbn><isbn>9783540327561</isbn><isbn>3540327568</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2006</creationdate><recordtype>book_chapter</recordtype><recordid>eNpVUEtLxDAYjC-wrD35B_Yi4qGaL48vyVGW9QELiqvn8DVNtbq2pakH_71b1oPOZQZmGIZh7BT4JXBurgDQCoXCo9pjuTNWasWlMBphn2WAAIWUyh3887Q6ZBmXXBTOKHnM8pTe-RYSLAqbsfPHLjVj07XFU0zj0IQxVvP1Vznp9nW-jjSEt606YUc1bVLMf3nGXm6Wz4u7YvVwe7-4XhW90DgWOoqAxikMLrpYCcdRVy4ItNxCRCcJFS-JqlgZlJXSXCsToSZBURFZOWNnu96eUqBNPVAbmuT7ofmk4duDQ-sAptzFLpf6aWgcfNl1H8kD99NZ_s9Z8gdQ11UE</recordid><startdate>2006</startdate><enddate>2006</enddate><creator>Mäkinen, Veli</creator><creator>Navarro, Gonzalo</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2006</creationdate><title>Position-Restricted Substring Searching</title><author>Mäkinen, Veli ; Navarro, Gonzalo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p256t-5e2c67946c9e9ed29065d9c268081e693a640baaded763d450547e1fa2ae4aa83</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Compressible Text</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Information systems. Data bases</topic><topic>Left Child</topic><topic>Memory organisation. Data processing</topic><topic>Rank Query</topic><topic>Select Query</topic><topic>Software</topic><topic>Theoretical computing</topic><topic>Wavelet Tree</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mäkinen, Veli</creatorcontrib><creatorcontrib>Navarro, Gonzalo</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mäkinen, Veli</au><au>Navarro, Gonzalo</au><au>Kiwi, Marcos</au><au>Correa, José R.</au><au>Hevia, Alejandro</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Position-Restricted Substring Searching</atitle><btitle>LATIN 2006: Theoretical Informatics</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2006</date><risdate>2006</risdate><spage>703</spage><epage>714</epage><pages>703-714</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540327554</isbn><isbn>354032755X</isbn><eisbn>9783540327561</eisbn><eisbn>3540327568</eisbn><abstract>A full-text index is a data structure built over a text string T[1,n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all those occ positions. There exist several indexes that solve (a) in O(m) time and (b) in O(occ) time. In this paper we propose two new queries, (c) counting how many times P[1,m] appears in T[l,r] and (d) locating all those occl,r positions. These can be solved using (a) and (b) but this requires O(occ) time. We present two solutions to (c) and (d) in this paper. The first is an index that requires O(nlog n) bits of space and answers (c) in O(m+log n) time and (d) in O(log n) time per occurrence (that is, O(occl,r log n) time overall). A variant of the first solution answers (c) in O(m+loglog n) time and (d) in constant time per occurrence, but requires O(nlog\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\rm 1+{\it \epsilon}}$\end{document}n) bits of space for any constant ε > 0. The second solution requires O(nm log σ) bits of space, solving (c) in O(m⌈log σ / loglog n⌉) time and (d) in O(m⌈log σ / loglog n⌉) time per occurrence, where σ is the alphabet size. This second structure takes less space when the text is compressible. Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T[1,i] and also locate the i-th occurrence of c in T. Our solution to (c) extends character rank queries to substring rank queries, and our solution to (d) extends character select to substring select queries. As a byproduct, we show how rank queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s wavelet trees are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11682462_64</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	LATIN 2006: Theoretical Informatics, 2006, p.703-714
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_19689118
source	Springer Books
subjects	Applied sciences Compressible Text Computer science control theory systems Exact sciences and technology Information systems. Data bases Left Child Memory organisation. Data processing Rank Query Select Query Software Theoretical computing Wavelet Tree
title	Position-Restricted Substring Searching
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T02%3A38%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Position-Restricted%20Substring%20Searching&rft.btitle=LATIN%202006:%20Theoretical%20Informatics&rft.au=M%C3%A4kinen,%20Veli&rft.date=2006&rft.spage=703&rft.epage=714&rft.pages=703-714&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540327554&rft.isbn_list=354032755X&rft_id=info:doi/10.1007/11682462_64&rft_dat=%3Cpascalfrancis_sprin%3E19689118%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540327561&rft.eisbn_list=3540327568&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true