Optimization framework for DFG-based automated process discovery approaches

The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software and systems modeling 2021-08, Vol.20 (4), p.1245-1270
Hauptverfasser: Augusto, Adriano, Dumas, Marlon, La Rosa, Marcello, Leemans, Sander J. J., vanden Broucke, Seppe K. L. M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1270
container_issue 4
container_start_page 1245
container_title Software and systems modeling
container_volume 20
creator Augusto, Adriano
Dumas, Marlon
La Rosa, Marcello
Leemans, Sander J. J.
vanden Broucke, Seppe K. L. M.
description The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.
doi_str_mv 10.1007/s10270-020-00846-x
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2567803409</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2567803409</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-101b8a68c82ee90d4088934be5c9acb3fa75f1270be71443e1e3affab8101b673</originalsourceid><addsrcrecordid>eNp9UD1PwzAQtRBIVKV_gCkSc-Acu7YzokILolIXmC3HPUOA1MFOoOXX4xIEG8Ppnk7vQ_cIOaVwTgHkRaRQSMihSAOKi3x7QEZU0DKnTPLDXyzEMZnEWFcAvChLLsSI3K3arm7qT9PVfpO5YBr88OElcz5kV_NFXpmI68z0nW9Ml1AbvMUYs3UdrX_HsMtMm27GPmE8IUfOvEac_OwxeZhf389u8uVqcTu7XOaWCdblFGiljFBWFYglrDkoVTJe4dSWxlbMGTl1NH1UoaScM6TIjHOmUnulkGxMzgbfFPzWY-z0s-_DJkXqYiqkAsahTKxiYNngYwzodBvqxoSdpqD3vemhN51609-96W0SsUEUE3nziOHP-h_VF8CxcU4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2567803409</pqid></control><display><type>article</type><title>Optimization framework for DFG-based automated process discovery approaches</title><source>SpringerNature Journals</source><creator>Augusto, Adriano ; Dumas, Marlon ; La Rosa, Marcello ; Leemans, Sander J. J. ; vanden Broucke, Seppe K. L. M.</creator><creatorcontrib>Augusto, Adriano ; Dumas, Marlon ; La Rosa, Marcello ; Leemans, Sander J. J. ; vanden Broucke, Seppe K. L. M.</creatorcontrib><description>The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.</description><identifier>ISSN: 1619-1366</identifier><identifier>EISSN: 1619-1374</identifier><identifier>DOI: 10.1007/s10270-020-00846-x</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Automation ; Compilers ; Computer Science ; Heuristic methods ; Information Systems Applications (incl.Internet) ; Interpreters ; IT in Business ; Model accuracy ; Optimization ; Optimization techniques ; Programming Languages ; Programming Techniques ; Regular Paper ; Software Engineering ; Software Engineering/Programming and Operating Systems</subject><ispartof>Software and systems modeling, 2021-08, Vol.20 (4), p.1245-1270</ispartof><rights>The Author(s) 2021</rights><rights>The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-101b8a68c82ee90d4088934be5c9acb3fa75f1270be71443e1e3affab8101b673</citedby><cites>FETCH-LOGICAL-c363t-101b8a68c82ee90d4088934be5c9acb3fa75f1270be71443e1e3affab8101b673</cites><orcidid>0000-0001-7970-5246</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10270-020-00846-x$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10270-020-00846-x$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Augusto, Adriano</creatorcontrib><creatorcontrib>Dumas, Marlon</creatorcontrib><creatorcontrib>La Rosa, Marcello</creatorcontrib><creatorcontrib>Leemans, Sander J. J.</creatorcontrib><creatorcontrib>vanden Broucke, Seppe K. L. M.</creatorcontrib><title>Optimization framework for DFG-based automated process discovery approaches</title><title>Software and systems modeling</title><addtitle>Softw Syst Model</addtitle><description>The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.</description><subject>Accuracy</subject><subject>Automation</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Heuristic methods</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Interpreters</subject><subject>IT in Business</subject><subject>Model accuracy</subject><subject>Optimization</subject><subject>Optimization techniques</subject><subject>Programming Languages</subject><subject>Programming Techniques</subject><subject>Regular Paper</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><issn>1619-1366</issn><issn>1619-1374</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9UD1PwzAQtRBIVKV_gCkSc-Acu7YzokILolIXmC3HPUOA1MFOoOXX4xIEG8Ppnk7vQ_cIOaVwTgHkRaRQSMihSAOKi3x7QEZU0DKnTPLDXyzEMZnEWFcAvChLLsSI3K3arm7qT9PVfpO5YBr88OElcz5kV_NFXpmI68z0nW9Ml1AbvMUYs3UdrX_HsMtMm27GPmE8IUfOvEac_OwxeZhf389u8uVqcTu7XOaWCdblFGiljFBWFYglrDkoVTJe4dSWxlbMGTl1NH1UoaScM6TIjHOmUnulkGxMzgbfFPzWY-z0s-_DJkXqYiqkAsahTKxiYNngYwzodBvqxoSdpqD3vemhN51609-96W0SsUEUE3nziOHP-h_VF8CxcU4</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Augusto, Adriano</creator><creator>Dumas, Marlon</creator><creator>La Rosa, Marcello</creator><creator>Leemans, Sander J. J.</creator><creator>vanden Broucke, Seppe K. L. M.</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-7970-5246</orcidid></search><sort><creationdate>20210801</creationdate><title>Optimization framework for DFG-based automated process discovery approaches</title><author>Augusto, Adriano ; Dumas, Marlon ; La Rosa, Marcello ; Leemans, Sander J. J. ; vanden Broucke, Seppe K. L. M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-101b8a68c82ee90d4088934be5c9acb3fa75f1270be71443e1e3affab8101b673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Automation</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Heuristic methods</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Interpreters</topic><topic>IT in Business</topic><topic>Model accuracy</topic><topic>Optimization</topic><topic>Optimization techniques</topic><topic>Programming Languages</topic><topic>Programming Techniques</topic><topic>Regular Paper</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Augusto, Adriano</creatorcontrib><creatorcontrib>Dumas, Marlon</creatorcontrib><creatorcontrib>La Rosa, Marcello</creatorcontrib><creatorcontrib>Leemans, Sander J. J.</creatorcontrib><creatorcontrib>vanden Broucke, Seppe K. L. M.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Software and systems modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Augusto, Adriano</au><au>Dumas, Marlon</au><au>La Rosa, Marcello</au><au>Leemans, Sander J. J.</au><au>vanden Broucke, Seppe K. L. M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimization framework for DFG-based automated process discovery approaches</atitle><jtitle>Software and systems modeling</jtitle><stitle>Softw Syst Model</stitle><date>2021-08-01</date><risdate>2021</risdate><volume>20</volume><issue>4</issue><spage>1245</spage><epage>1270</epage><pages>1245-1270</pages><issn>1619-1366</issn><eissn>1619-1374</eissn><abstract>The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10270-020-00846-x</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0001-7970-5246</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1619-1366
ispartof Software and systems modeling, 2021-08, Vol.20 (4), p.1245-1270
issn 1619-1366
1619-1374
language eng
recordid cdi_proquest_journals_2567803409
source SpringerNature Journals
subjects Accuracy
Automation
Compilers
Computer Science
Heuristic methods
Information Systems Applications (incl.Internet)
Interpreters
IT in Business
Model accuracy
Optimization
Optimization techniques
Programming Languages
Programming Techniques
Regular Paper
Software Engineering
Software Engineering/Programming and Operating Systems
title Optimization framework for DFG-based automated process discovery approaches
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A45%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimization%20framework%20for%20DFG-based%20automated%20process%20discovery%20approaches&rft.jtitle=Software%20and%20systems%20modeling&rft.au=Augusto,%20Adriano&rft.date=2021-08-01&rft.volume=20&rft.issue=4&rft.spage=1245&rft.epage=1270&rft.pages=1245-1270&rft.issn=1619-1366&rft.eissn=1619-1374&rft_id=info:doi/10.1007/s10270-020-00846-x&rft_dat=%3Cproquest_cross%3E2567803409%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2567803409&rft_id=info:pmid/&rfr_iscdi=true