SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint

Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nguyen, Phuong, Ishakian, Vatche, Muthusamy, Vinod, Slominski, Aleksander
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Nguyen, Phuong Ishakian, Vatche Muthusamy, Vinod Slominski, Aleksander
description	Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.
doi_str_mv	10.48550/arxiv.1905.00983
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1905_00983</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1905_00983</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-dfd73d77474c36c611b2ff7e5b765596de7170d8f558b3755288ea7fee0ff5e93</originalsourceid><addsrcrecordid>eNotz7FOwzAUhWEvDKjwAEz4BRKcOo4dtiikUKkVCMLCEjnxNboicZDtAuXpKYXpl85wpI-Qi4yluRKCXWn_hR9pVjKRMlYqfkrGp-fttnpcvzQ317SxFgcEF-nK6wk-Z_9G7exp5fS4_0b3Sre7MaLBCVzA-bDSBz8PEAJtvT6U7pwBTxuDMTEYonYD0Hp2IXqNLp6RE6vHAOf_XZB21bT1XbK5v13X1SbRheSJsUZyI2Uu84EXQ5Fl_dJaCaKXhRBlYUBmkhllhVA9l0IslQItLQCzVkDJF-Ty7_bI7d49Ttrvu192d2TzHxFvU_k</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><source>arXiv.org</source><creator>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</creator><creatorcontrib>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</creatorcontrib><description>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</description><identifier>DOI: 10.48550/arxiv.1905.00983</identifier><language>eng</language><subject>Computer Science - Databases</subject><creationdate>2019-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1905.00983$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1905.00983$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Nguyen, Phuong</creatorcontrib><creatorcontrib>Ishakian, Vatche</creatorcontrib><creatorcontrib>Muthusamy, Vinod</creatorcontrib><creatorcontrib>Slominski, Aleksander</creatorcontrib><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><description>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</description><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUhWEvDKjwAEz4BRKcOo4dtiikUKkVCMLCEjnxNboicZDtAuXpKYXpl85wpI-Qi4yluRKCXWn_hR9pVjKRMlYqfkrGp-fttnpcvzQ317SxFgcEF-nK6wk-Z_9G7exp5fS4_0b3Sre7MaLBCVzA-bDSBz8PEAJtvT6U7pwBTxuDMTEYonYD0Hp2IXqNLp6RE6vHAOf_XZB21bT1XbK5v13X1SbRheSJsUZyI2Uu84EXQ5Fl_dJaCaKXhRBlYUBmkhllhVA9l0IslQItLQCzVkDJF-Ty7_bI7d49Ttrvu192d2TzHxFvU_k</recordid><startdate>20190502</startdate><enddate>20190502</enddate><creator>Nguyen, Phuong</creator><creator>Ishakian, Vatche</creator><creator>Muthusamy, Vinod</creator><creator>Slominski, Aleksander</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190502</creationdate><title>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</title><author>Nguyen, Phuong ; Ishakian, Vatche ; Muthusamy, Vinod ; Slominski, Aleksander</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-dfd73d77474c36c611b2ff7e5b765596de7170d8f558b3755288ea7fee0ff5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Phuong</creatorcontrib><creatorcontrib>Ishakian, Vatche</creatorcontrib><creatorcontrib>Muthusamy, Vinod</creatorcontrib><creatorcontrib>Slominski, Aleksander</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nguyen, Phuong</au><au>Ishakian, Vatche</au><au>Muthusamy, Vinod</au><au>Slominski, Aleksander</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint</atitle><date>2019-05-02</date><risdate>2019</risdate><abstract>Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.</abstract><doi>10.48550/arxiv.1905.00983</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1905.00983
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1905_00983
source	arXiv.org
subjects	Computer Science - Databases
title	SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T16%3A36%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SUMMARIZED:%20Efficient%20Framework%20for%20Analyzing%20Multidimensional%20Process%20Traces%20under%20Edit-distance%20Constraint&rft.au=Nguyen,%20Phuong&rft.date=2019-05-02&rft_id=info:doi/10.48550/arxiv.1905.00983&rft_dat=%3Carxiv_GOX%3E1905_00983%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true