MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System

The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might tr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Weixiong Rao, Lei Chen, Pan Hui, Tarkoma, S.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Clustering algorithms Equations Indexes Optimization Registers Resource management Throughput
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	454
container_issue
container_start_page	445
container_title
container_volume
creator	Weixiong Rao Lei Chen Pan Hui Tarkoma, S.
description	The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.
doi_str_mv	10.1109/ICDCS.2012.32
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6258017</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6258017</ieee_id><sourcerecordid>6258017</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-da2f0d5c02b7dc4a491189938a6762d9800ea6689d947b24b5037ccdc1a7cb5c3</originalsourceid><addsrcrecordid>eNotzM1OAjEUQOH6l4jI0pWbvsBgb6ftbd3hAEocwwJ1SzrthdTAYKaTEN5eEz2bb3cYuwMxBhDuYVFNq9VYCpDjUp6xkUMr0DitjNX2nA2kRl1YBXDBbkBpRCGddpdsAMKUhXESr9ko5y_xG1oAaQesflt-zh75hNe-2xJfBb8j_kqn46GLxZPPFHl1aHtqez5Pu5661G65byOfppxpn1rfp0PLV6fc0_6WXW38LtPo3yH7mM_eq5eiXj4vqkldJEDdF9HLjYg6CNlgDMorB2CdK603aGR0VgjyxlgXncJGqkaLEkOIATyGRodyyO7_vomI1t9d2vvutDZSWwFY_gBpNFB2</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</creator><creatorcontrib>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</creatorcontrib><description>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</description><identifier>ISSN: 1063-6927</identifier><identifier>ISBN: 1457702959</identifier><identifier>ISBN: 9781457702952</identifier><identifier>EISSN: 2575-8411</identifier><identifier>EISBN: 9780769546858</identifier><identifier>EISBN: 0769546854</identifier><identifier>DOI: 10.1109/ICDCS.2012.32</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clustering algorithms ; Equations ; Indexes ; Optimization ; Registers ; Resource management ; Throughput</subject><ispartof>2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012, p.445-454</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6258017$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6258017$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Weixiong Rao</creatorcontrib><creatorcontrib>Lei Chen</creatorcontrib><creatorcontrib>Pan Hui</creatorcontrib><creatorcontrib>Tarkoma, S.</creatorcontrib><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><title>2012 IEEE 32nd International Conference on Distributed Computing Systems</title><addtitle>ICDSC</addtitle><description>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</description><subject>Clustering algorithms</subject><subject>Equations</subject><subject>Indexes</subject><subject>Optimization</subject><subject>Registers</subject><subject>Resource management</subject><subject>Throughput</subject><issn>1063-6927</issn><issn>2575-8411</issn><isbn>1457702959</isbn><isbn>9781457702952</isbn><isbn>9780769546858</isbn><isbn>0769546854</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotzM1OAjEUQOH6l4jI0pWbvsBgb6ftbd3hAEocwwJ1SzrthdTAYKaTEN5eEz2bb3cYuwMxBhDuYVFNq9VYCpDjUp6xkUMr0DitjNX2nA2kRl1YBXDBbkBpRCGddpdsAMKUhXESr9ko5y_xG1oAaQesflt-zh75hNe-2xJfBb8j_kqn46GLxZPPFHl1aHtqez5Pu5661G65byOfppxpn1rfp0PLV6fc0_6WXW38LtPo3yH7mM_eq5eiXj4vqkldJEDdF9HLjYg6CNlgDMorB2CdK603aGR0VgjyxlgXncJGqkaLEkOIATyGRodyyO7_vomI1t9d2vvutDZSWwFY_gBpNFB2</recordid><startdate>201206</startdate><enddate>201206</enddate><creator>Weixiong Rao</creator><creator>Lei Chen</creator><creator>Pan Hui</creator><creator>Tarkoma, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201206</creationdate><title>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</title><author>Weixiong Rao ; Lei Chen ; Pan Hui ; Tarkoma, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-da2f0d5c02b7dc4a491189938a6762d9800ea6689d947b24b5037ccdc1a7cb5c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Clustering algorithms</topic><topic>Equations</topic><topic>Indexes</topic><topic>Optimization</topic><topic>Registers</topic><topic>Resource management</topic><topic>Throughput</topic><toplevel>online_resources</toplevel><creatorcontrib>Weixiong Rao</creatorcontrib><creatorcontrib>Lei Chen</creatorcontrib><creatorcontrib>Pan Hui</creatorcontrib><creatorcontrib>Tarkoma, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Weixiong Rao</au><au>Lei Chen</au><au>Pan Hui</au><au>Tarkoma, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System</atitle><btitle>2012 IEEE 32nd International Conference on Distributed Computing Systems</btitle><stitle>ICDSC</stitle><date>2012-06</date><risdate>2012</risdate><spage>445</spage><epage>454</epage><pages>445-454</pages><issn>1063-6927</issn><eissn>2575-8411</eissn><isbn>1457702959</isbn><isbn>9781457702952</isbn><eisbn>9780769546858</eisbn><eisbn>0769546854</eisbn><coden>IEEPAD</coden><abstract>The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.</abstract><pub>IEEE</pub><doi>10.1109/ICDCS.2012.32</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6927
ispartof	2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012, p.445-454
issn	1063-6927 2575-8411
language	eng
recordid	cdi_ieee_primary_6258017
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Clustering algorithms Equations Indexes Optimization Registers Resource management Throughput
title	MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T03%3A08%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=MOVE:%20A%20Large%20Scale%20Keyword-Based%20Content%20Filtering%20and%20Dissemination%20System&rft.btitle=2012%20IEEE%2032nd%20International%20Conference%20on%20Distributed%20Computing%20Systems&rft.au=Weixiong%20Rao&rft.date=2012-06&rft.spage=445&rft.epage=454&rft.pages=445-454&rft.issn=1063-6927&rft.eissn=2575-8411&rft.isbn=1457702959&rft.isbn_list=9781457702952&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDCS.2012.32&rft_dat=%3Cieee_6IE%3E6258017%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769546858&rft.eisbn_list=0769546854&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6258017&rfr_iscdi=true