SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint

Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operation...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Nguyen, Phuong, Ishakian, Vatche, Muthusamy, Vinod, Slominski, Aleksander
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Nguyen, Phuong
Ishakian, Vatche
Muthusamy, Vinod
Slominski, Aleksander
description Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.
doi_str_mv 10.48550/arxiv.1905.00983
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1905_00983</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1905_00983</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-dfd73d77474c36c611b2ff7e5b765596de7170d8f558b3755288ea7fee0ff5e93</originalsourceid><addsrcrecordid>eNotz7FOwzAUhWEvDKjwAEz4BRKcOo4dtiikUKkVCMLCEjnxNboicZDtAuXpKYXpl85wpI-Qi4yluRKCXWn_hR9pVjKRMlYqfkrGp-fttnpcvzQ317SxFgcEF-nK6wk-Z_9G7exp5fS4_0b3Sre7MaLBCVzA-bDSBz8PEAJtvT6U7pwBTxuDMTEYonYD0Hp2IXqNLp6RE6vHAOf_XZB21bT1XbK5v13X1SbRheSJsUZyI2Uu84EXQ5Fl_dJaCaKXhRBlYUBmkhllhVA9l0IslQItLQCzVkDJF-Ty7_bI7d49Ttrvu192d2TzHxFvU_k</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><source>arXiv.org</source><creator>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</creator><creatorcontrib>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</creatorcontrib><description>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</description><identifier>DOI: 10.48550/arxiv.1905.00983</identifier><language>eng</language><subject>Computer Science - Databases</subject><creationdate>2019-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1905.00983$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1905.00983$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Nguyen, Phuong</creatorcontrib><creatorcontrib>Ishakian, Vatche</creatorcontrib><creatorcontrib>Muthusamy, Vinod</creatorcontrib><creatorcontrib>Slominski, Aleksander</creatorcontrib><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><description>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</description><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUhWEvDKjwAEz4BRKcOo4dtiikUKkVCMLCEjnxNboicZDtAuXpKYXpl85wpI-Qi4yluRKCXWn_hR9pVjKRMlYqfkrGp-fttnpcvzQ317SxFgcEF-nK6wk-Z_9G7exp5fS4_0b3Sre7MaLBCVzA-bDSBz8PEAJtvT6U7pwBTxuDMTEYonYD0Hp2IXqNLp6RE6vHAOf_XZB21bT1XbK5v13X1SbRheSJsUZyI2Uu84EXQ5Fl_dJaCaKXhRBlYUBmkhllhVA9l0IslQItLQCzVkDJF-Ty7_bI7d49Ttrvu192d2TzHxFvU_k</recordid><startdate>20190502</startdate><enddate>20190502</enddate><creator>Nguyen, Phuong</creator><creator>Ishakian, Vatche</creator><creator>Muthusamy, Vinod</creator><creator>Slominski, Aleksander</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190502</creationdate><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><author>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-dfd73d77474c36c611b2ff7e5b765596de7170d8f558b3755288ea7fee0ff5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Phuong</creatorcontrib><creatorcontrib>Ishakian, Vatche</creatorcontrib><creatorcontrib>Muthusamy, Vinod</creatorcontrib><creatorcontrib>Slominski, Aleksander</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nguyen, Phuong</au><au>Ishakian, Vatche</au><au>Muthusamy, Vinod</au><au>Slominski, Aleksander</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</atitle><date>2019-05-02</date><risdate>2019</risdate><abstract>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</abstract><doi>10.48550/arxiv.1905.00983</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1905.00983
ispartof
issn
language eng
recordid cdi_arxiv_primary_1905_00983
source arXiv.org
subjects Computer Science - Databases
title SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T16%3A36%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SUMMARIZED:%20Efficient%20Framework%20for%20Analyzing%20Multidimensional%20Process%20Traces%20under%20Edit-distance%20Constraint&rft.au=Nguyen,%20Phuong&rft.date=2019-05-02&rft_id=info:doi/10.48550/arxiv.1905.00983&rft_dat=%3Carxiv_GOX%3E1905_00983%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true