SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint
Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operation...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Domains such as scientific workflows and business processes exhibit data
models with complex relationships between objects. This relationship is
typically represented as sequences, where each data item is annotated with
multi-dimensional attributes. There is a need to analyze this data for
operational insights. For example, in business processes, users are interested
in clustering process traces into smaller subsets to discover less complex
process models. This requires expensive computation of similarity metrics
between sequence-based data. Related work on dimension reduction and embedding
methods do not take into account the multi-dimensional attributes of data, and
do not address the interpretability of data in the embedding space (i.e., by
favoring vector-based representation). In this work, we introduce Summarized, a
framework for efficient analysis on sequence-based multi-dimensional data using
intuitive and user-controlled summarizations. We introduce summarization
schemes that provide tunable trade-offs between the quality and efficiency of
analysis tasks and derive an error model for summary-based similarity under an
edit-distance constraint. Evaluations using real-world datasets show the
effectives of our framework. |
---|---|
DOI: | 10.48550/arxiv.1905.00983 |