A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree

The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scalable Computing. Practice and Experience 2021-12, Vol.22 (4), p.387-400
Hauptverfasser: Srivastav, Shashank, Singh, Pradeep Kumar, Yadav, Divakar
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 400
container_issue 4
container_start_page 387
container_title Scalable Computing. Practice and Experience
container_volume 22
creator Srivastav, Shashank
Singh, Pradeep Kumar
Yadav, Divakar
description The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.
doi_str_mv 10.12694/scpe.v22i4.1870
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_12694_scpe_v22i4_1870</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_12694_scpe_v22i4_1870</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOObuHvMHOr8kbZMcR5k62FCkIJ5Kln5xlW4tSVbmv3edHnwv73t4eA8PIfcM5oznOn0Itsf5wHmTzpmScEUmTOksYTKX1__2LZmF8AXnCJbmGZuQjwXdYNx1NY0dXe173w1IlydjI92YaHfN4ZO-YTi2MdDmQIvujGAIWNMST5Eewwi8Gm_aFlv6bgZsMdLSI96RG2fagLO_npLycVkWz8n65WlVLNaJ5amICQorBTrttlKC1nnOOTcOFGCWWaZqVgOrrVa2zhUYcJIbKVQGXDvYskxMCfzeWt-F4NFVvW_2xn9XDKqLnGqUU13kVKMc8QN7uFkE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</creator><creatorcontrib>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</creatorcontrib><description>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</description><identifier>ISSN: 1895-1767</identifier><identifier>EISSN: 1895-1767</identifier><identifier>DOI: 10.12694/scpe.v22i4.1870</identifier><language>eng</language><ispartof>Scalable Computing. Practice and Experience, 2021-12, Vol.22 (4), p.387-400</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Srivastav, Shashank</creatorcontrib><creatorcontrib>Singh, Pradeep Kumar</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><title>Scalable Computing. Practice and Experience</title><description>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</description><issn>1895-1767</issn><issn>1895-1767</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNkEFLwzAYhoMoOObuHvMHOr8kbZMcR5k62FCkIJ5Kln5xlW4tSVbmv3edHnwv73t4eA8PIfcM5oznOn0Itsf5wHmTzpmScEUmTOksYTKX1__2LZmF8AXnCJbmGZuQjwXdYNx1NY0dXe173w1IlydjI92YaHfN4ZO-YTi2MdDmQIvujGAIWNMST5Eewwi8Gm_aFlv6bgZsMdLSI96RG2fagLO_npLycVkWz8n65WlVLNaJ5amICQorBTrttlKC1nnOOTcOFGCWWaZqVgOrrVa2zhUYcJIbKVQGXDvYskxMCfzeWt-F4NFVvW_2xn9XDKqLnGqUU13kVKMc8QN7uFkE</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Srivastav, Shashank</creator><creator>Singh, Pradeep Kumar</creator><creator>Yadav, Divakar</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20211201</creationdate><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><author>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Srivastav, Shashank</creatorcontrib><creatorcontrib>Singh, Pradeep Kumar</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><collection>CrossRef</collection><jtitle>Scalable Computing. Practice and Experience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Srivastav, Shashank</au><au>Singh, Pradeep Kumar</au><au>Yadav, Divakar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</atitle><jtitle>Scalable Computing. Practice and Experience</jtitle><date>2021-12-01</date><risdate>2021</risdate><volume>22</volume><issue>4</issue><spage>387</spage><epage>400</epage><pages>387-400</pages><issn>1895-1767</issn><eissn>1895-1767</eissn><abstract>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</abstract><doi>10.12694/scpe.v22i4.1870</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1895-1767
ispartof Scalable Computing. Practice and Experience, 2021-12, Vol.22 (4), p.387-400
issn 1895-1767
1895-1767
language eng
recordid cdi_crossref_primary_10_12694_scpe_v22i4_1870
source EZB-FREE-00999 freely available EZB journals
title A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T21%3A19%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Method%20to%20Improve%20Exact%20Matching%20Results%20in%20Compressed%20Text%20using%20Parallel%20Wavelet%20Tree&rft.jtitle=Scalable%20Computing.%20Practice%20and%20Experience&rft.au=Srivastav,%20Shashank&rft.date=2021-12-01&rft.volume=22&rft.issue=4&rft.spage=387&rft.epage=400&rft.pages=387-400&rft.issn=1895-1767&rft.eissn=1895-1767&rft_id=info:doi/10.12694/scpe.v22i4.1870&rft_dat=%3Ccrossref%3E10_12694_scpe_v22i4_1870%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true