MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System

The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Weixiong Rao, Lei Chen, Pan Hui, Tarkoma, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 454
container_issue
container_start_page 445
container_title
container_volume
creator Weixiong Rao
Lei Chen
Pan Hui
Tarkoma, S.
description The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.
doi_str_mv 10.1109/ICDCS.2012.32
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6258017</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6258017</ieee_id><sourcerecordid>6258017</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-da2f0d5c02b7dc4a491189938a6762d9800ea6689d947b24b5037ccdc1a7cb5c3</originalsourceid><addsrcrecordid>eNotzM1OAjEUQOH6l4jI0pWbvsBgb6ftbd3hAEocwwJ1SzrthdTAYKaTEN5eEz2bb3cYuwMxBhDuYVFNq9VYCpDjUp6xkUMr0DitjNX2nA2kRl1YBXDBbkBpRCGddpdsAMKUhXESr9ko5y_xG1oAaQesflt-zh75hNe-2xJfBb8j_kqn46GLxZPPFHl1aHtqez5Pu5661G65byOfppxpn1rfp0PLV6fc0_6WXW38LtPo3yH7mM_eq5eiXj4vqkldJEDdF9HLjYg6CNlgDMorB2CdK603aGR0VgjyxlgXncJGqkaLEkOIATyGRodyyO7_vomI1t9d2vvutDZSWwFY_gBpNFB2</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</creator><creatorcontrib>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</creatorcontrib><description>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</description><identifier>ISSN: 1063-6927</identifier><identifier>ISBN: 1457702959</identifier><identifier>ISBN: 9781457702952</identifier><identifier>EISSN: 2575-8411</identifier><identifier>EISBN: 9780769546858</identifier><identifier>EISBN: 0769546854</identifier><identifier>DOI: 10.1109/ICDCS.2012.32</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clustering algorithms ; Equations ; Indexes ; Optimization ; Registers ; Resource management ; Throughput</subject><ispartof>2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012, p.445-454</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6258017$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6258017$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Weixiong Rao</creatorcontrib><creatorcontrib>Lei Chen</creatorcontrib><creatorcontrib>Pan Hui</creatorcontrib><creatorcontrib>Tarkoma, S.</creatorcontrib><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><title>2012 IEEE 32nd International Conference on Distributed Computing Systems</title><addtitle>ICDSC</addtitle><description>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</description><subject>Clustering algorithms</subject><subject>Equations</subject><subject>Indexes</subject><subject>Optimization</subject><subject>Registers</subject><subject>Resource management</subject><subject>Throughput</subject><issn>1063-6927</issn><issn>2575-8411</issn><isbn>1457702959</isbn><isbn>9781457702952</isbn><isbn>9780769546858</isbn><isbn>0769546854</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotzM1OAjEUQOH6l4jI0pWbvsBgb6ftbd3hAEocwwJ1SzrthdTAYKaTEN5eEz2bb3cYuwMxBhDuYVFNq9VYCpDjUp6xkUMr0DitjNX2nA2kRl1YBXDBbkBpRCGddpdsAMKUhXESr9ko5y_xG1oAaQesflt-zh75hNe-2xJfBb8j_kqn46GLxZPPFHl1aHtqez5Pu5661G65byOfppxpn1rfp0PLV6fc0_6WXW38LtPo3yH7mM_eq5eiXj4vqkldJEDdF9HLjYg6CNlgDMorB2CdK603aGR0VgjyxlgXncJGqkaLEkOIATyGRodyyO7_vomI1t9d2vvutDZSWwFY_gBpNFB2</recordid><startdate>201206</startdate><enddate>201206</enddate><creator>Weixiong Rao</creator><creator>Lei Chen</creator><creator>Pan Hui</creator><creator>Tarkoma, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201206</creationdate><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><author>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-da2f0d5c02b7dc4a491189938a6762d9800ea6689d947b24b5037ccdc1a7cb5c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Clustering algorithms</topic><topic>Equations</topic><topic>Indexes</topic><topic>Optimization</topic><topic>Registers</topic><topic>Resource management</topic><topic>Throughput</topic><toplevel>online_resources</toplevel><creatorcontrib>Weixiong Rao</creatorcontrib><creatorcontrib>Lei Chen</creatorcontrib><creatorcontrib>Pan Hui</creatorcontrib><creatorcontrib>Tarkoma, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Weixiong Rao</au><au>Lei Chen</au><au>Pan Hui</au><au>Tarkoma, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</atitle><btitle>2012 IEEE 32nd International Conference on Distributed Computing Systems</btitle><stitle>ICDSC</stitle><date>2012-06</date><risdate>2012</risdate><spage>445</spage><epage>454</epage><pages>445-454</pages><issn>1063-6927</issn><eissn>2575-8411</eissn><isbn>1457702959</isbn><isbn>9781457702952</isbn><eisbn>9780769546858</eisbn><eisbn>0769546854</eisbn><coden>IEEPAD</coden><abstract>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</abstract><pub>IEEE</pub><doi>10.1109/ICDCS.2012.32</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6927
ispartof 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012, p.445-454
issn 1063-6927
2575-8411
language eng
recordid cdi_ieee_primary_6258017
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Clustering algorithms
Equations
Indexes
Optimization
Registers
Resource management
Throughput
title MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T03%3A08%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=MOVE:%20A%20Large%20Scale%20Keyword-Based%20Content%20Filtering%20and%20Dissemination%20System&rft.btitle=2012%20IEEE%2032nd%20International%20Conference%20on%20Distributed%20Computing%20Systems&rft.au=Weixiong%20Rao&rft.date=2012-06&rft.spage=445&rft.epage=454&rft.pages=445-454&rft.issn=1063-6927&rft.eissn=2575-8411&rft.isbn=1457702959&rft.isbn_list=9781457702952&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDCS.2012.32&rft_dat=%3Cieee_6IE%3E6258017%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769546858&rft.eisbn_list=0769546854&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6258017&rfr_iscdi=true