Aggregate Profile Clustering for Streaming Analytics

Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer journal 2015-09, Vol.58 (9), p.2092-2108
Hauptverfasser:	Abbasoglu, Mehmet Ali, Gedik, Bugra, Ferhatosmanoglu, Hakan
Format:	Artikel
Sprache:	eng
Schlagworte:	Aggregates Algorithms Clustering Clusters Computer memory Computer simulation Construction Heuristics Partitioning Streams Summaries Telecommunications industry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2108
container_issue	9
container_start_page	2092
container_title	Computer journal
container_volume	58
creator	Abbasoglu, Mehmet Ali Gedik, Bugra Ferhatosmanoglu, Hakan
description	Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.
doi_str_mv	10.1093/comjnl/bxv023
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1753489290</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1753489290</sourcerecordid><originalsourceid>FETCH-LOGICAL-c223t-456d1fe4d9d75cfc51dced5a27593ab359ec592d05dd550caa6b70d773b94a2c3</originalsourceid><addsrcrecordid>eNpd0E1LxDAQgOEgCq6rR-8FL17qTr7NcVn8ggUF9RzSJC1d0mZNWnH_vV3qydMw8DAML0LXGO4wKLqysdv1YVX9fAOhJ2iBmYCSgJCnaAGAoWSCwDm6yHkHAASUWCC2bprkGzP44i3Fug2-2IQxDz61fVPUMRXvQ_KmO27r3oTD0Np8ic5qE7K_-ptL9Pn48LF5LrevTy-b9ba0hNChZFw4XHvmlJPc1pZjZ73jhkiuqKkoV95yRRxw5zgHa4yoJDgpaaWYIZYu0e18d5_i1-jzoLs2Wx-C6X0cs8aSU3aviIKJ3vyjuzim6eGjAiWxoERNqpyVTTHn5Gu9T21n0kFj0MeGem6o54b0F1THZkM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1709716329</pqid></control><display><type>article</type><title>Aggregate Profile Clustering for Streaming Analytics</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Abbasoglu, Mehmet Ali ; Gedik, Bugra ; Ferhatosmanoglu, Hakan</creator><creatorcontrib>Abbasoglu, Mehmet Ali ; Gedik, Bugra ; Ferhatosmanoglu, Hakan</creatorcontrib><description>Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.</description><identifier>ISSN: 0010-4620</identifier><identifier>EISSN: 1460-2067</identifier><identifier>DOI: 10.1093/comjnl/bxv023</identifier><identifier>CODEN: CMPJAG</identifier><language>eng</language><publisher>Oxford: Oxford Publishing Limited (England)</publisher><subject>Aggregates ; Algorithms ; Clustering ; Clusters ; Computer memory ; Computer simulation ; Construction ; Heuristics ; Partitioning ; Streams ; Summaries ; Telecommunications industry</subject><ispartof>Computer journal, 2015-09, Vol.58 (9), p.2092-2108</ispartof><rights>Copyright Oxford Publishing Limited(England) Sep 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c223t-456d1fe4d9d75cfc51dced5a27593ab359ec592d05dd550caa6b70d773b94a2c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Abbasoglu, Mehmet Ali</creatorcontrib><creatorcontrib>Gedik, Bugra</creatorcontrib><creatorcontrib>Ferhatosmanoglu, Hakan</creatorcontrib><title>Aggregate Profile Clustering for Streaming Analytics</title><title>Computer journal</title><description>Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.</description><subject>Aggregates</subject><subject>Algorithms</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Computer memory</subject><subject>Computer simulation</subject><subject>Construction</subject><subject>Heuristics</subject><subject>Partitioning</subject><subject>Streams</subject><subject>Summaries</subject><subject>Telecommunications industry</subject><issn>0010-4620</issn><issn>1460-2067</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNpd0E1LxDAQgOEgCq6rR-8FL17qTr7NcVn8ggUF9RzSJC1d0mZNWnH_vV3qydMw8DAML0LXGO4wKLqysdv1YVX9fAOhJ2iBmYCSgJCnaAGAoWSCwDm6yHkHAASUWCC2bprkGzP44i3Fug2-2IQxDz61fVPUMRXvQ_KmO27r3oTD0Np8ic5qE7K_-ptL9Pn48LF5LrevTy-b9ba0hNChZFw4XHvmlJPc1pZjZ73jhkiuqKkoV95yRRxw5zgHa4yoJDgpaaWYIZYu0e18d5_i1-jzoLs2Wx-C6X0cs8aSU3aviIKJ3vyjuzim6eGjAiWxoERNqpyVTTHn5Gu9T21n0kFj0MeGem6o54b0F1THZkM</recordid><startdate>20150901</startdate><enddate>20150901</enddate><creator>Abbasoglu, Mehmet Ali</creator><creator>Gedik, Bugra</creator><creator>Ferhatosmanoglu, Hakan</creator><general>Oxford Publishing Limited (England)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20150901</creationdate><title>Aggregate Profile Clustering for Streaming Analytics</title><author>Abbasoglu, Mehmet Ali ; Gedik, Bugra ; Ferhatosmanoglu, Hakan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c223t-456d1fe4d9d75cfc51dced5a27593ab359ec592d05dd550caa6b70d773b94a2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Aggregates</topic><topic>Algorithms</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Computer memory</topic><topic>Computer simulation</topic><topic>Construction</topic><topic>Heuristics</topic><topic>Partitioning</topic><topic>Streams</topic><topic>Summaries</topic><topic>Telecommunications industry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Abbasoglu, Mehmet Ali</creatorcontrib><creatorcontrib>Gedik, Bugra</creatorcontrib><creatorcontrib>Ferhatosmanoglu, Hakan</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abbasoglu, Mehmet Ali</au><au>Gedik, Bugra</au><au>Ferhatosmanoglu, Hakan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Aggregate Profile Clustering for Streaming Analytics</atitle><jtitle>Computer journal</jtitle><date>2015-09-01</date><risdate>2015</risdate><volume>58</volume><issue>9</issue><spage>2092</spage><epage>2108</epage><pages>2092-2108</pages><issn>0010-4620</issn><eissn>1460-2067</eissn><coden>CMPJAG</coden><abstract>Many analytic applications require analyzing user interaction data. In particular, such data can be aggregated over a window to build user activity profiles. Clustering such aggregate profiles is useful for grouping together users with similar behaviors, so that common models could be built for them. In this paper, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Owing to the potentially large number of users and high rate of interactions, maintaining profile clusters can have high processing and memory resource requirements. To tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over nodes such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, in order to adapt to potentially changing user interaction patterns, the partitioning of profiles to nodes should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We develop a re-partitioning technique that achieves all these goals. To achieve this, we keep micro-cluster summaries at each node and periodically collect these summaries at a central node to perform re-partitioning. We use a greedy algorithm with novel affinity heuristics to revise the partitioning and update the routing tables without introducing a lengthy pause. We showcase the effectiveness of our approach using an application that clusters customers of a telecommunications company based on their aggregate calling profiles.</abstract><cop>Oxford</cop><pub>Oxford Publishing Limited (England)</pub><doi>10.1093/comjnl/bxv023</doi><tpages>17</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0010-4620
ispartof	Computer journal, 2015-09, Vol.58 (9), p.2092-2108
issn	0010-4620 1460-2067
language	eng
recordid	cdi_proquest_miscellaneous_1753489290
source	Oxford University Press Journals All Titles (1996-Current)
subjects	Aggregates Algorithms Clustering Clusters Computer memory Computer simulation Construction Heuristics Partitioning Streams Summaries Telecommunications industry
title	Aggregate Profile Clustering for Streaming Analytics
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A31%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Aggregate%20Profile%20Clustering%20for%20Streaming%20Analytics&rft.jtitle=Computer%20journal&rft.au=Abbasoglu,%20Mehmet%20Ali&rft.date=2015-09-01&rft.volume=58&rft.issue=9&rft.spage=2092&rft.epage=2108&rft.pages=2092-2108&rft.issn=0010-4620&rft.eissn=1460-2067&rft.coden=CMPJAG&rft_id=info:doi/10.1093/comjnl/bxv023&rft_dat=%3Cproquest_cross%3E1753489290%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1709716329&rft_id=info:pmid/&rfr_iscdi=true