A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree
The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the s...
Gespeichert in:
Veröffentlicht in: | Scalable Computing. Practice and Experience 2021-12, Vol.22 (4), p.387-400 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 400 |
---|---|
container_issue | 4 |
container_start_page | 387 |
container_title | Scalable Computing. Practice and Experience |
container_volume | 22 |
creator | Srivastav, Shashank Singh, Pradeep Kumar Yadav, Divakar |
description | The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result. |
doi_str_mv | 10.12694/scpe.v22i4.1870 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_12694_scpe_v22i4_1870</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_12694_scpe_v22i4_1870</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOObuHvMHOr8kbZMcR5k62FCkIJ5Kln5xlW4tSVbmv3edHnwv73t4eA8PIfcM5oznOn0Itsf5wHmTzpmScEUmTOksYTKX1__2LZmF8AXnCJbmGZuQjwXdYNx1NY0dXe173w1IlydjI92YaHfN4ZO-YTi2MdDmQIvujGAIWNMST5Eewwi8Gm_aFlv6bgZsMdLSI96RG2fagLO_npLycVkWz8n65WlVLNaJ5amICQorBTrttlKC1nnOOTcOFGCWWaZqVgOrrVa2zhUYcJIbKVQGXDvYskxMCfzeWt-F4NFVvW_2xn9XDKqLnGqUU13kVKMc8QN7uFkE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</creator><creatorcontrib>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</creatorcontrib><description>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</description><identifier>ISSN: 1895-1767</identifier><identifier>EISSN: 1895-1767</identifier><identifier>DOI: 10.12694/scpe.v22i4.1870</identifier><language>eng</language><ispartof>Scalable Computing. Practice and Experience, 2021-12, Vol.22 (4), p.387-400</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Srivastav, Shashank</creatorcontrib><creatorcontrib>Singh, Pradeep Kumar</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><title>Scalable Computing. Practice and Experience</title><description>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</description><issn>1895-1767</issn><issn>1895-1767</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNkEFLwzAYhoMoOObuHvMHOr8kbZMcR5k62FCkIJ5Kln5xlW4tSVbmv3edHnwv73t4eA8PIfcM5oznOn0Itsf5wHmTzpmScEUmTOksYTKX1__2LZmF8AXnCJbmGZuQjwXdYNx1NY0dXe173w1IlydjI92YaHfN4ZO-YTi2MdDmQIvujGAIWNMST5Eewwi8Gm_aFlv6bgZsMdLSI96RG2fagLO_npLycVkWz8n65WlVLNaJ5amICQorBTrttlKC1nnOOTcOFGCWWaZqVgOrrVa2zhUYcJIbKVQGXDvYskxMCfzeWt-F4NFVvW_2xn9XDKqLnGqUU13kVKMc8QN7uFkE</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Srivastav, Shashank</creator><creator>Singh, Pradeep Kumar</creator><creator>Yadav, Divakar</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20211201</creationdate><title>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</title><author>Srivastav, Shashank ; Singh, Pradeep Kumar ; Yadav, Divakar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-e3c73ef9fb7709966222af080e55c18d1d01dc98cd680a0f72a7385029f0b153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Srivastav, Shashank</creatorcontrib><creatorcontrib>Singh, Pradeep Kumar</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><collection>CrossRef</collection><jtitle>Scalable Computing. Practice and Experience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Srivastav, Shashank</au><au>Singh, Pradeep Kumar</au><au>Yadav, Divakar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree</atitle><jtitle>Scalable Computing. Practice and Experience</jtitle><date>2021-12-01</date><risdate>2021</risdate><volume>22</volume><issue>4</issue><spage>387</spage><epage>400</epage><pages>387-400</pages><issn>1895-1767</issn><eissn>1895-1767</eissn><abstract>The process of searching on the World Wide Web (WWW) is increasing regularly, and users around the world also use it regularly. In WWW the size of the text corpus is constantly increasing at an exponential rate, so we need an efficient indexing algorithm that reduces both space and time during the search process. This paper proposes a new technique that utilizes Word-Based Tagging Coding compression which is implemented using Parallel Wavelet Tree, called WBTC_PWT. WBTC_PWT uses the word-based tagging coding encoding technique to reduce the space complexity of the index and uses a parallel wavelet tree which reduces the time it takes to construct indexes. This technique utilizes the features of compressed pattern matching to minimize search time complexity. In this technique, all the unique words present in the text corpus are divided into different levels according to the word frequency table and a different wavelet tree is made for each level in parallel. Compared to other existing search algorithms based on compressed text, the proposed WBTC_PWT search method is significantly faster and it reduces the chances of getting the false matching result.</abstract><doi>10.12694/scpe.v22i4.1870</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1895-1767 |
ispartof | Scalable Computing. Practice and Experience, 2021-12, Vol.22 (4), p.387-400 |
issn | 1895-1767 1895-1767 |
language | eng |
recordid | cdi_crossref_primary_10_12694_scpe_v22i4_1870 |
source | EZB-FREE-00999 freely available EZB journals |
title | A Method to Improve Exact Matching Results in Compressed Text using Parallel Wavelet Tree |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T21%3A19%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Method%20to%20Improve%20Exact%20Matching%20Results%20in%20Compressed%20Text%20using%20Parallel%20Wavelet%20Tree&rft.jtitle=Scalable%20Computing.%20Practice%20and%20Experience&rft.au=Srivastav,%20Shashank&rft.date=2021-12-01&rft.volume=22&rft.issue=4&rft.spage=387&rft.epage=400&rft.pages=387-400&rft.issn=1895-1767&rft.eissn=1895-1767&rft_id=info:doi/10.12694/scpe.v22i4.1870&rft_dat=%3Ccrossref%3E10_12694_scpe_v22i4_1870%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |