Linear-size suffix tries

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Theoretical computer science 2016-07, Vol.638, p.171-178
Hauptverfasser: Crochemore, Maxime, Epifanio, Chiara, Grossi, Roberto, Mignosi, Filippo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 178
container_issue
container_start_page 171
container_title Theoretical computer science
container_volume 638
creator Crochemore, Maxime
Epifanio, Chiara
Grossi, Roberto
Mignosi, Filippo
description Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.
doi_str_mv 10.1016/j.tcs.2016.04.002
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01388452v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0304397516300305</els_id><sourcerecordid>1825504550</sourcerecordid><originalsourceid>FETCH-LOGICAL-c477t-cc052a54ac772171e83316033bdc28e3079036027f0fd411e907c895b65f4edf3</originalsourceid><addsrcrecordid>eNp9kM1LAzEQxYMoWKt3vfWoh10nX5ssnkrxCxa86Dmk2QmmbLs12Rb1rzdlxaMDwwzD7z2YR8gVhZICrW5X5eBSyfJagigB2BGZUK3qgrFaHJMJcBAFr5U8JWcprSCXVNWEXDZhgzYWKXzjLO28D5-zIQZM5-TE2y7hxe-ckreH-9fFU9G8PD4v5k3hhFJD4RxIZqWwTilGFUXNOa2A82XrmEYOqgZeAVMefCsoxRqU07VcVtILbD2fkpvR9912ZhvD2sYv09tgnuaNOdyAcq2FZHua2euR3cb-Y4dpMOuQHHad3WC_S4ZqJiWI3BmlI-pin1JE_-dNwRwSMyuTEzOHxAwIkxPLmrtRg_nffcBokgu4cdiGiG4wbR_-Uf8Aqw1v3g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1825504550</pqid></control><display><type>article</type><title>Linear-size suffix tries</title><source>ScienceDirect Journals (5 years ago - present)</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Crochemore, Maxime ; Epifanio, Chiara ; Grossi, Roberto ; Mignosi, Filippo</creator><creatorcontrib>Crochemore, Maxime ; Epifanio, Chiara ; Grossi, Roberto ; Mignosi, Filippo</creatorcontrib><description>Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.</description><identifier>ISSN: 0304-3975</identifier><identifier>EISSN: 1879-2294</identifier><identifier>DOI: 10.1016/j.tcs.2016.04.002</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Algorithms ; Computer Science ; Data Structures and Algorithms ; Digital ; Equivalence ; Factor and suffix automata ; Indexing ; Links ; Pattern matching ; Strings ; Suffix tree ; Suffix trees ; Text indexing ; Texts</subject><ispartof>Theoretical computer science, 2016-07, Vol.638, p.171-178</ispartof><rights>2016 Elsevier B.V.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c477t-cc052a54ac772171e83316033bdc28e3079036027f0fd411e907c895b65f4edf3</citedby><cites>FETCH-LOGICAL-c477t-cc052a54ac772171e83316033bdc28e3079036027f0fd411e907c895b65f4edf3</cites><orcidid>0000-0003-1087-1419</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.tcs.2016.04.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,780,784,885,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01388452$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Crochemore, Maxime</creatorcontrib><creatorcontrib>Epifanio, Chiara</creatorcontrib><creatorcontrib>Grossi, Roberto</creatorcontrib><creatorcontrib>Mignosi, Filippo</creatorcontrib><title>Linear-size suffix tries</title><title>Theoretical computer science</title><description>Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.</description><subject>Algorithms</subject><subject>Computer Science</subject><subject>Data Structures and Algorithms</subject><subject>Digital</subject><subject>Equivalence</subject><subject>Factor and suffix automata</subject><subject>Indexing</subject><subject>Links</subject><subject>Pattern matching</subject><subject>Strings</subject><subject>Suffix tree</subject><subject>Suffix trees</subject><subject>Text indexing</subject><subject>Texts</subject><issn>0304-3975</issn><issn>1879-2294</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LAzEQxYMoWKt3vfWoh10nX5ssnkrxCxa86Dmk2QmmbLs12Rb1rzdlxaMDwwzD7z2YR8gVhZICrW5X5eBSyfJagigB2BGZUK3qgrFaHJMJcBAFr5U8JWcprSCXVNWEXDZhgzYWKXzjLO28D5-zIQZM5-TE2y7hxe-ckreH-9fFU9G8PD4v5k3hhFJD4RxIZqWwTilGFUXNOa2A82XrmEYOqgZeAVMefCsoxRqU07VcVtILbD2fkpvR9912ZhvD2sYv09tgnuaNOdyAcq2FZHua2euR3cb-Y4dpMOuQHHad3WC_S4ZqJiWI3BmlI-pin1JE_-dNwRwSMyuTEzOHxAwIkxPLmrtRg_nffcBokgu4cdiGiG4wbR_-Uf8Aqw1v3g</recordid><startdate>20160725</startdate><enddate>20160725</enddate><creator>Crochemore, Maxime</creator><creator>Epifanio, Chiara</creator><creator>Grossi, Roberto</creator><creator>Mignosi, Filippo</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-1087-1419</orcidid></search><sort><creationdate>20160725</creationdate><title>Linear-size suffix tries</title><author>Crochemore, Maxime ; Epifanio, Chiara ; Grossi, Roberto ; Mignosi, Filippo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c477t-cc052a54ac772171e83316033bdc28e3079036027f0fd411e907c895b65f4edf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Computer Science</topic><topic>Data Structures and Algorithms</topic><topic>Digital</topic><topic>Equivalence</topic><topic>Factor and suffix automata</topic><topic>Indexing</topic><topic>Links</topic><topic>Pattern matching</topic><topic>Strings</topic><topic>Suffix tree</topic><topic>Suffix trees</topic><topic>Text indexing</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Crochemore, Maxime</creatorcontrib><creatorcontrib>Epifanio, Chiara</creatorcontrib><creatorcontrib>Grossi, Roberto</creatorcontrib><creatorcontrib>Mignosi, Filippo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Theoretical computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Crochemore, Maxime</au><au>Epifanio, Chiara</au><au>Grossi, Roberto</au><au>Mignosi, Filippo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Linear-size suffix tries</atitle><jtitle>Theoretical computer science</jtitle><date>2016-07-25</date><risdate>2016</risdate><volume>638</volume><spage>171</spage><epage>178</epage><pages>171-178</pages><issn>0304-3975</issn><eissn>1879-2294</eissn><abstract>Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.tcs.2016.04.002</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-1087-1419</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0304-3975
ispartof Theoretical computer science, 2016-07, Vol.638, p.171-178
issn 0304-3975
1879-2294
language eng
recordid cdi_hal_primary_oai_HAL_hal_01388452v1
source ScienceDirect Journals (5 years ago - present); EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Computer Science
Data Structures and Algorithms
Digital
Equivalence
Factor and suffix automata
Indexing
Links
Pattern matching
Strings
Suffix tree
Suffix trees
Text indexing
Texts
title Linear-size suffix tries
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A06%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Linear-size%20suffix%20tries&rft.jtitle=Theoretical%20computer%20science&rft.au=Crochemore,%20Maxime&rft.date=2016-07-25&rft.volume=638&rft.spage=171&rft.epage=178&rft.pages=171-178&rft.issn=0304-3975&rft.eissn=1879-2294&rft_id=info:doi/10.1016/j.tcs.2016.04.002&rft_dat=%3Cproquest_hal_p%3E1825504550%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1825504550&rft_id=info:pmid/&rft_els_id=S0304397516300305&rfr_iscdi=true