Finding persistent items in data streams

Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a freque...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2016-12, Vol.10 (4), p.289-300
Hauptverfasser:	Dai, Haipeng, Shahzad, Muhammad, Liu, Alex X., Zhong, Yuankun
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	300
container_issue	4
container_start_page	289
container_title	Proceedings of the VLDB Endowment
container_volume	10
creator	Dai, Haipeng Shahzad, Muhammad Liu, Alex X. Zhong, Yuankun
description	Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.
doi_str_mv	10.14778/3025111.3025112
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3025111_3025112</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3025111_3025112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-1075825573ad0d025e8818ed696d6b92231de8bc9759e16b94f498973f328b023</originalsourceid><addsrcrecordid>eNpNjzFPwzAQhS0EEqWwM3pkSbmzY_s8oooCUiUWmCMnviAjEirbC_-eimZg-p7e8PQ-IW4RNtg6R_calEHEzYnqTKwUGmgIvDv_ly_FVSmfAJYs0krc7dIc0_whD5xLKpXnKlPlqcg0yxhqkKVmDlO5Fhdj-Cp8s3At3nePb9vnZv_69LJ92DeDanVtEJwhZYzTIUI8XmEiJI7W22h7r5TGyNQP3hnPeGzasfXknR61oh6UXgs47Q75u5TMY3fIaQr5p0Po_ky7xXSh0r_xLUOx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Finding persistent items in data streams</title><source>ACM Digital Library Complete</source><creator>Dai, Haipeng ; Shahzad, Muhammad ; Liu, Alex X. ; Zhong, Yuankun</creator><creatorcontrib>Dai, Haipeng ; Shahzad, Muhammad ; Liu, Alex X. ; Zhong, Yuankun</creatorcontrib><description>Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3025111.3025112</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2016-12, Vol.10 (4), p.289-300</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-1075825573ad0d025e8818ed696d6b92231de8bc9759e16b94f498973f328b023</citedby><cites>FETCH-LOGICAL-c243t-1075825573ad0d025e8818ed696d6b92231de8bc9759e16b94f498973f328b023</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Dai, Haipeng</creatorcontrib><creatorcontrib>Shahzad, Muhammad</creatorcontrib><creatorcontrib>Liu, Alex X.</creatorcontrib><creatorcontrib>Zhong, Yuankun</creatorcontrib><title>Finding persistent items in data streams</title><title>Proceedings of the VLDB Endowment</title><description>Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpNjzFPwzAQhS0EEqWwM3pkSbmzY_s8oooCUiUWmCMnviAjEirbC_-eimZg-p7e8PQ-IW4RNtg6R_calEHEzYnqTKwUGmgIvDv_ly_FVSmfAJYs0krc7dIc0_whD5xLKpXnKlPlqcg0yxhqkKVmDlO5Fhdj-Cp8s3At3nePb9vnZv_69LJ92DeDanVtEJwhZYzTIUI8XmEiJI7W22h7r5TGyNQP3hnPeGzasfXknR61oh6UXgs47Q75u5TMY3fIaQr5p0Po_ky7xXSh0r_xLUOx</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Dai, Haipeng</creator><creator>Shahzad, Muhammad</creator><creator>Liu, Alex X.</creator><creator>Zhong, Yuankun</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20161201</creationdate><title>Finding persistent items in data streams</title><author>Dai, Haipeng ; Shahzad, Muhammad ; Liu, Alex X. ; Zhong, Yuankun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-1075825573ad0d025e8818ed696d6b92231de8bc9759e16b94f498973f328b023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dai, Haipeng</creatorcontrib><creatorcontrib>Shahzad, Muhammad</creatorcontrib><creatorcontrib>Liu, Alex X.</creatorcontrib><creatorcontrib>Zhong, Yuankun</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dai, Haipeng</au><au>Shahzad, Muhammad</au><au>Liu, Alex X.</au><au>Zhong, Yuankun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Finding persistent items in data streams</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2016-12-01</date><risdate>2016</risdate><volume>10</volume><issue>4</issue><spage>289</spage><epage>300</epage><pages>289-300</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.</abstract><doi>10.14778/3025111.3025112</doi><tpages>12</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2150-8097
ispartof	Proceedings of the VLDB Endowment, 2016-12, Vol.10 (4), p.289-300
issn	2150-8097 2150-8097
language	eng
recordid	cdi_crossref_primary_10_14778_3025111_3025112
source	ACM Digital Library Complete
title	Finding persistent items in data streams
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T09%3A06%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Finding%20persistent%20items%20in%20data%20streams&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Dai,%20Haipeng&rft.date=2016-12-01&rft.volume=10&rft.issue=4&rft.spage=289&rft.epage=300&rft.pages=289-300&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3025111.3025112&rft_dat=%3Ccrossref%3E10_14778_3025111_3025112%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true