Detecting Process Duration Drift Using Gamma Mixture Models in a Left-Truncated and Right-Censored Environment

Within the realm of business context, process duration signifies time spent by customers between successive activities. This temporal perspective offers important insight to customer behavior, highlighting potential bottlenecks, and influencing business management decisions. The distribution of thes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on knowledge discovery from data 2024-09, Vol.18 (8), p.1-24, Article 195
Hauptverfasser: Yang, Lingkai, McClean, Sally, Donnelly, Mark, Khan, Kashaf, Burke, Kevin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Within the realm of business context, process duration signifies time spent by customers between successive activities. This temporal perspective offers important insight to customer behavior, highlighting potential bottlenecks, and influencing business management decisions. The distribution of these process duration often changes over time due to factors such as seasonality, emerging legislation, changes to supply chains, and customer demand. Referred to as concept drift, these variations pose challenges for robust process modeling, understanding, and refinement. Subsequently, gamma mixture models are widely employed to model durations. These source data can, however, become left-truncated and right-censored within any specific observation window thereby necessitating a (well-known) modification to the likelihood function. The approach reported in this article leveraged this adapted likelihood across a series of observation windows, applying the likelihood ratio test to identify duration changes/concept drift. Due to its flexibility in modelling any duration distribution, the gamma mixture model was used with Nelder–Mead optimized likelihood for the left-truncated and right-censored data. The number of gamma components was determined by the Bayesian information criterion. The proposed framework underwent validation through simulated exponential samples, leading to recommendations for its practical application. Subsequently, we applied the methodology to three real-life event logs exhibiting diverse characteristics. Experimental results showcase the effectiveness of our approach in terms of data fitting, as compared to Kaplan–Meier curves, and in detecting instances of drift. This comprehensive validation underscores the practical utility and reliability of our framework for dynamic business scenarios.
ISSN:1556-4681
1556-472X
DOI:10.1145/3669942