Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams

Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of O(1) O(1) for in-order streams, real-world d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2025-02, Vol.37 (2), p.698-709
Hauptverfasser: Li, Jianjun, Deng, Yuhui, Huang, Jiande, Yi, Zhou, Yang, Qifen, Min, Geyong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 709
container_issue 2
container_start_page 698
container_title IEEE transactions on knowledge and data engineering
container_volume 37
creator Li, Jianjun
Deng, Yuhui
Huang, Jiande
Yi, Zhou
Yang, Qifen
Min, Geyong
description Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of O(1) O(1) for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.
doi_str_mv 10.1109/TKDE.2024.3511334
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TKDE_2024_3511334</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10777062</ieee_id><sourcerecordid>10_1109_TKDE_2024_3511334</sourcerecordid><originalsourceid>FETCH-LOGICAL-c632-993237caca321e45e11135c7e43724c580143ee8ca0957071fbb8748b5a321493</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwAUgs_AMpHj9qh10fISAqddFKLCPXnQTTNEFOWsTfk9AuWM3V6J6R5hByD2wEwOLH9ds8GXHG5UgoACHkBRmAUibiEMNll5mESAqpr8lN03wyxow2MCAuRbern2iS5955rFq6Kv3WVwV999W2_qaToghY2NbXVbdqP2gabHUobYimtsEtnR7KHU2O3v01lkcMdOoLOretpas2oN03t-Qqt2WDd-c5JOvnZD17iRbL9HU2WURuLHgUx4IL7ayzggNKhdD9oZxGKTSXThkGUiAaZ1msNNOQbzZGS7NRPSBjMSRwOutC3TQB8-wr-L0NPxmwrJeU9ZKyXlJ2ltQxDyfGI-K_vtaajbn4BcU_Yes</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Jianjun ; Deng, Yuhui ; Huang, Jiande ; Yi, Zhou ; Yang, Qifen ; Min, Geyong</creator><creatorcontrib>Li, Jianjun ; Deng, Yuhui ; Huang, Jiande ; Yi, Zhou ; Yang, Qifen ; Min, Geyong</creatorcontrib><description><![CDATA[Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of <inline-formula><tex-math notation="LaTeX">O(1)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="deng-ieq1-3511334.gif"/> </inline-formula> for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.]]></description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2024.3511334</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>IEEE</publisher><subject>Aggregates ; Binary trees ; Bulk eviction ; Computer science ; incremental computation ; Indexes ; Out of order ; out-of-order data streams ; Partitioning algorithms ; sliding window aggregation ; stream algorithm ; Streams ; Throughput ; Time complexity ; Windows</subject><ispartof>IEEE transactions on knowledge and data engineering, 2025-02, Vol.37 (2), p.698-709</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c632-993237caca321e45e11135c7e43724c580143ee8ca0957071fbb8748b5a321493</cites><orcidid>0000-0002-1639-8020 ; 0009-0008-1203-9558 ; 0000-0003-3266-1885 ; 0000-0002-1522-8943 ; 0000-0003-1395-7314</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10777062$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10777062$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Jianjun</creatorcontrib><creatorcontrib>Deng, Yuhui</creatorcontrib><creatorcontrib>Huang, Jiande</creatorcontrib><creatorcontrib>Yi, Zhou</creatorcontrib><creatorcontrib>Yang, Qifen</creatorcontrib><creatorcontrib>Min, Geyong</creatorcontrib><title>Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description><![CDATA[Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of <inline-formula><tex-math notation="LaTeX">O(1)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="deng-ieq1-3511334.gif"/> </inline-formula> for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.]]></description><subject>Aggregates</subject><subject>Binary trees</subject><subject>Bulk eviction</subject><subject>Computer science</subject><subject>incremental computation</subject><subject>Indexes</subject><subject>Out of order</subject><subject>out-of-order data streams</subject><subject>Partitioning algorithms</subject><subject>sliding window aggregation</subject><subject>stream algorithm</subject><subject>Streams</subject><subject>Throughput</subject><subject>Time complexity</subject><subject>Windows</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwAUgs_AMpHj9qh10fISAqddFKLCPXnQTTNEFOWsTfk9AuWM3V6J6R5hByD2wEwOLH9ds8GXHG5UgoACHkBRmAUibiEMNll5mESAqpr8lN03wyxow2MCAuRbern2iS5955rFq6Kv3WVwV999W2_qaToghY2NbXVbdqP2gabHUobYimtsEtnR7KHU2O3v01lkcMdOoLOretpas2oN03t-Qqt2WDd-c5JOvnZD17iRbL9HU2WURuLHgUx4IL7ayzggNKhdD9oZxGKTSXThkGUiAaZ1msNNOQbzZGS7NRPSBjMSRwOutC3TQB8-wr-L0NPxmwrJeU9ZKyXlJ2ltQxDyfGI-K_vtaajbn4BcU_Yes</recordid><startdate>202502</startdate><enddate>202502</enddate><creator>Li, Jianjun</creator><creator>Deng, Yuhui</creator><creator>Huang, Jiande</creator><creator>Yi, Zhou</creator><creator>Yang, Qifen</creator><creator>Min, Geyong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-1639-8020</orcidid><orcidid>https://orcid.org/0009-0008-1203-9558</orcidid><orcidid>https://orcid.org/0000-0003-3266-1885</orcidid><orcidid>https://orcid.org/0000-0002-1522-8943</orcidid><orcidid>https://orcid.org/0000-0003-1395-7314</orcidid></search><sort><creationdate>202502</creationdate><title>Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams</title><author>Li, Jianjun ; Deng, Yuhui ; Huang, Jiande ; Yi, Zhou ; Yang, Qifen ; Min, Geyong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c632-993237caca321e45e11135c7e43724c580143ee8ca0957071fbb8748b5a321493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Aggregates</topic><topic>Binary trees</topic><topic>Bulk eviction</topic><topic>Computer science</topic><topic>incremental computation</topic><topic>Indexes</topic><topic>Out of order</topic><topic>out-of-order data streams</topic><topic>Partitioning algorithms</topic><topic>sliding window aggregation</topic><topic>stream algorithm</topic><topic>Streams</topic><topic>Throughput</topic><topic>Time complexity</topic><topic>Windows</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jianjun</creatorcontrib><creatorcontrib>Deng, Yuhui</creatorcontrib><creatorcontrib>Huang, Jiande</creatorcontrib><creatorcontrib>Yi, Zhou</creatorcontrib><creatorcontrib>Yang, Qifen</creatorcontrib><creatorcontrib>Min, Geyong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Jianjun</au><au>Deng, Yuhui</au><au>Huang, Jiande</au><au>Yi, Zhou</au><au>Yang, Qifen</au><au>Min, Geyong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2025-02</date><risdate>2025</risdate><volume>37</volume><issue>2</issue><spage>698</spage><epage>709</epage><pages>698-709</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract><![CDATA[Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of <inline-formula><tex-math notation="LaTeX">O(1)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="deng-ieq1-3511334.gif"/> </inline-formula> for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.]]></abstract><pub>IEEE</pub><doi>10.1109/TKDE.2024.3511334</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-1639-8020</orcidid><orcidid>https://orcid.org/0009-0008-1203-9558</orcidid><orcidid>https://orcid.org/0000-0003-3266-1885</orcidid><orcidid>https://orcid.org/0000-0002-1522-8943</orcidid><orcidid>https://orcid.org/0000-0003-1395-7314</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2025-02, Vol.37 (2), p.698-709
issn 1041-4347
1558-2191
language eng
recordid cdi_crossref_primary_10_1109_TKDE_2024_3511334
source IEEE Electronic Library (IEL)
subjects Aggregates
Binary trees
Bulk eviction
Computer science
incremental computation
Indexes
Out of order
out-of-order data streams
Partitioning algorithms
sliding window aggregation
stream algorithm
Streams
Throughput
Time complexity
Windows
title Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T07%3A27%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gecko:%20Efficient%20Sliding%20Window%20Aggregation%20With%20Granular-Based%20Bulk%20Eviction%20Over%20Big%20Data%20Streams&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Li,%20Jianjun&rft.date=2025-02&rft.volume=37&rft.issue=2&rft.spage=698&rft.epage=709&rft.pages=698-709&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2024.3511334&rft_dat=%3Ccrossref_RIE%3E10_1109_TKDE_2024_3511334%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10777062&rfr_iscdi=true