Search | arXiv e-print repository

An Invertible State Space for Process Trees

Authors: Gero Kolhof, Sebastiaan J. van Zelst

Abstract: Process models are, like event data, first-class citizens in most process mining approaches. Several process modeling formalisms have been proposed and used, e.g., Petri nets, BPMN, and process trees. Despite their frequent use, little research addresses the formal properties of process trees and the corresponding potential to improve the efficiency of solving common computational problems. Theref… ▽ More Process models are, like event data, first-class citizens in most process mining approaches. Several process modeling formalisms have been proposed and used, e.g., Petri nets, BPMN, and process trees. Despite their frequent use, little research addresses the formal properties of process trees and the corresponding potential to improve the efficiency of solving common computational problems. Therefore, in this paper, we propose an invertible state space definition for process trees and demonstrate that the corresponding state space graph is isomorphic to the state space graph of the tree's inverse. Our result supports the development of novel, time-efficient, decomposition strategies for applications of process trees. Our experiments confirm that our state space definition allows for the adoption of bidirectional state space search, which significantly improves the overall performance of state space searches. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures

arXiv:2301.07624 [pdf, other]

Performance-Preserving Event Log Sampling for Predictive Monitoring

Authors: Mohammadreza Fani Sani, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods… ▽ More Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods require a hyper-parameter optimization that requires several repetitions of the training process which is not feasible in many real-life applications. In this paper, we propose an instance selection procedure that allows sampling training process instances for prediction models. We show that our instance selection procedure allows for a significant increase of training speed for next activity and remaining time prediction methods while maintaining reliable levels of prediction accuracy. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 25 pages, 1 figure, 13 tables, 47 references. arXiv admin note: substantial text overlap with arXiv:2204.01470

arXiv:2211.04146 [pdf, other]

doi 10.1007/978-3-031-20984-0_2

Control-Flow-Based Querying of Process Executions from Partially Ordered Event Data

Authors: Daniel Schuster, Michael Martini, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions a… ▽ More Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions are considered sequentially ordered process activities. However, process executions are increasingly viewed as partially ordered activities to more accurately reflect process behavior observed in reality, such as simultaneous execution of activities. Process executions comprising partially ordered activities may contain more complex activity patterns than sequence-based process executions. This paper presents a novel query language to call up process executions from event logs containing partially ordered activities. The query language allows users to specify complex ordering relations over activities, i.e., control flow constraints. Evaluating a query for a given log returns process executions satisfying the specified constraints. We demonstrate the implementation of the query language in a process mining tool and evaluate its performance on real-life event logs. △ Less

Submitted 4 January, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2209.04290 [pdf, other]

doi 10.1007/978-3-031-17834-4_18

Conformance Checking for Trace Fragments Using Infix and Postfix Alignments

Authors: Daniel Schuster, Niklas Föcking, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefix… ▽ More Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefixes of process executions. This paper defines infix/postfix alignments, proposes approaches to their computation, and evaluates them using real-life event data. △ Less

Submitted 15 August, 2022; originally announced September 2022.

arXiv:2204.01470 [pdf, other]

doi 10.1007/978-3-030-98581-3_12

Event Log Sampling for Predictive Monitoring

Authors: Mohammadreza Fani Sani, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection p… ▽ More Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection procedure that allows sampling training process instances for prediction models. We show that our sampling method allows for a significant increase of training speed for next activity prediction methods while maintaining reliable levels of prediction accuracy. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 7 pages, 1 figure, 4 tables, 34 references

Journal ref: ICPM Workshops (2021) 154-166

arXiv:2110.02060 [pdf, other]

doi 10.1007/978-3-030-98581-3_3

Visualizing Trace Variants From Partially Ordered Event Data

Authors: Daniel Schuster, Lukas Schade, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Executing operational processes generates event data, which contain information on the executed process activities. Process mining techniques allow to systematically analyze event data to gain insights that are then used to optimize processes. Visual analytics for event data are essential for the application of process mining. Visualizing unique process executions -- also called trace variants, i.… ▽ More Executing operational processes generates event data, which contain information on the executed process activities. Process mining techniques allow to systematically analyze event data to gain insights that are then used to optimize processes. Visual analytics for event data are essential for the application of process mining. Visualizing unique process executions -- also called trace variants, i.e., unique sequences of executed process activities -- is a common technique implemented in many scientific and industrial process mining applications. Most existing visualizations assume a total order on the executed process activities, i.e., these techniques assume that process activities are atomic and were executed at a specific point in time. In reality, however, the executions of activities are not atomic. Multiple timestamps are recorded for an executed process activity, e.g., a start-timestamp and a complete-timestamp. Therefore, the execution of process activities may overlap and, thus, cannot be represented as a total order if more than one timestamp is to be considered. In this paper, we present a visualization approach for trace variants that incorporates start- and complete-timestamps of activities. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2108.00215 [pdf, other]

Freezing Sub-Models During Incremental Process Discovery: Extended Version

Authors: Daniel Schuster, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Process discovery aims to learn a process model from observed process behavior. From a user's perspective, most discovery algorithms work like a black box. Besides parameter tuning, there is no interaction between the user and the algorithm. Interactive process discovery allows the user to exploit domain knowledge and to guide the discovery process. Previously, an incremental discovery approach ha… ▽ More Process discovery aims to learn a process model from observed process behavior. From a user's perspective, most discovery algorithms work like a black box. Besides parameter tuning, there is no interaction between the user and the algorithm. Interactive process discovery allows the user to exploit domain knowledge and to guide the discovery process. Previously, an incremental discovery approach has been introduced where a model, considered to be under construction, gets incrementally extended by user-selected process behavior. This paper introduces a novel approach that additionally allows the user to freeze model parts within the model under construction. Frozen sub-models are not altered by the incremental approach when new behavior is added to the model. The user can thus steer the discovery algorithm. Our experiments show that freezing sub-models can lead to higher quality models. △ Less

Submitted 31 July, 2021; originally announced August 2021.

Comments: This paper is an extended version of the paper "Freezing Sub-Models During Incremental Process Discovery" presented at the 40th International Conference on Conceptual Modeling 2021

arXiv:2107.13066 [pdf, other]

Removing Operational Friction Using Process Mining: Challenges Provided by the Internet of Production (IoP)

Authors: Wil van der Aalst, Tobias Brockhoff, Anahita Farhang Ghahfarokhi, Mahsa Pourbafrani, Merih Seran Uysal, Sebastiaan van Zelst

Abstract: Operational processes in production, logistics, material handling, maintenance, etc., are supported by cyber-physical systems combining hardware and software components. As a result, the digital and the physical world are closely aligned, and it is possible to track operational processes in detail (e.g., using sensors). The abundance of event data generated by today's operational processes provide… ▽ More Operational processes in production, logistics, material handling, maintenance, etc., are supported by cyber-physical systems combining hardware and software components. As a result, the digital and the physical world are closely aligned, and it is possible to track operational processes in detail (e.g., using sensors). The abundance of event data generated by today's operational processes provides opportunities and challenges for process mining techniques supporting process discovery, performance analysis, and conformance checking. Using existing process mining tools, it is already possible to automatically discover process models and uncover performance and compliance problems. In the DFG-funded Cluster of Excellence "Internet of Production" (IoP), process mining is used to create "digital shadows" to improve a wide variety of operational processes. However, operational processes are dynamic, distributed, and complex. Driven by the challenges identified in the IoP cluster, we work on novel techniques for comparative process mining (comparing process variants for different products at different locations at different times), object-centric process mining (to handle processes involving different types of objects that interact), and forward-looking process mining (to explore "What if?" questions). By addressing these challenges, we aim to develop valuable "digital shadows" that can be used to remove operational friction. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: 30 pages, 21 figures

ACM Class: D.4.1; D.2.2; D.2.12; H.2

arXiv:2105.13155 [pdf, other]

doi 10.1007/978-3-030-85469-0_25

A Framework for Explainable Concept Drift Detection in Process Mining

Authors: Jan Niklas Adams, Sebastiaan J. van Zelst, Lara Quack, Kathrin Hausmann, Wil M. P. van der Aalst, Thomas Rose

Abstract: Rapidly changing business environments expose companies to high levels of uncertainty. This uncertainty manifests itself in significant changes that tend to occur over the lifetime of a process and possibly affect its performance. It is important to understand the root causes of such changes since this allows us to react to change or anticipate future changes. Research in process mining has so far… ▽ More Rapidly changing business environments expose companies to high levels of uncertainty. This uncertainty manifests itself in significant changes that tend to occur over the lifetime of a process and possibly affect its performance. It is important to understand the root causes of such changes since this allows us to react to change or anticipate future changes. Research in process mining has so far only focused on detecting, locating and characterizing significant changes in a process and not on finding root causes of such changes. In this paper, we aim to close this gap. We propose a framework that adds an explainability level onto concept drift detection in process mining and provides insights into the cause-effect relationships behind significant changes. We define different perspectives of a process, detect concept drifts in these perspectives and plug the perspectives into a causality check that determines whether these concept drifts can be causal to each other. We showcase the effectiveness of our framework by evaluating it on both synthetic and real event data. Our experiments show that our approach unravels cause-effect relationships and provides novel insights into executed processes. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2105.07666 [pdf, other]

doi 10.1007/978-3-030-76983-3_23

Cortado---An Interactive Tool for Data-Driven Process Discovery and Modeling

Authors: Daniel Schuster, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Process mining aims to diagnose and improve operational processes. Process mining techniques allow analyzing the event data generated and recorded during the execution of (business) processes to gain valuable insights. Process discovery is a key discipline in process mining that comprises the discovery of process models on the basis of the recorded event data. Most process discovery algorithms wor… ▽ More Process mining aims to diagnose and improve operational processes. Process mining techniques allow analyzing the event data generated and recorded during the execution of (business) processes to gain valuable insights. Process discovery is a key discipline in process mining that comprises the discovery of process models on the basis of the recorded event data. Most process discovery algorithms work in a fully automated fashion. Apart from adjusting their configuration parameters, conventional process discovery algorithms offer limited to no user interaction, i.e., we either edit the discovered process model by hand or change the algorithm's input by, for instance, filtering the event data. However, recent work indicates that the integration of domain knowledge in (semi-)automated process discovery algorithms often enhances the quality of the process models discovered. Therefore, this paper introduces Cortado, a novel process discovery tool that leverages domain knowledge while incrementally discovering a process model from given event data. Starting from an initial process model, Cortado enables the user to incrementally add new process behavior to the process model under construction in a visual and intuitive manner. As such, Cortado unifies the world of manual process modeling with that of automated process discovery. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2103.13315 [pdf, other]

Model Independent Error Bound Estimation for Conformance Checking Approximation

Authors: Mohammadreza Fani Sani, Martin Kabierski, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Conformance checking techniques allow us to quantify the correspondence of a process's execution, captured in event data, w.r.t., a reference process model. In this context, alignments have proven to be useful for calculating conformance statistics. However, for extensive event data and complex process models, the computation time of alignments is considerably high, hampering their practical use.… ▽ More Conformance checking techniques allow us to quantify the correspondence of a process's execution, captured in event data, w.r.t., a reference process model. In this context, alignments have proven to be useful for calculating conformance statistics. However, for extensive event data and complex process models, the computation time of alignments is considerably high, hampering their practical use. Simultaneously, it suffices to approximate either alignments or their corresponding conformance value(s) for many applications. Recent work has shown that using subsets of the process model behavior leads to accurate conformance approximations. The accuracy of such an approximation heavily depends on the selected subset of model behavior. Thus, in this paper, we show that we can derive a priori error bounds for conformance checking approximation based on arbitrary activity sequences, independently of the given process model. Such error bounds subsequently let us select the most relevant subset of process model behavior for the alignment approximation. Experiments confirm that conformance approximation accuracy improves when using the proposed error bound approximation to guide the selection of relevant subsets of process model behavior. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2009.14094 [pdf, other]

doi 10.1007/978-3-030-72693-5_19

Alignment Approximation for Process Trees

Authors: Daniel Schuster, Sebastiaan van Zelst, Wil M. P. van der Aalst

Abstract: Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grow… ▽ More Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grows non-linearly with the size of the model and the observed behavior, leading to the well-known state space explosion problem. In this paper, we present a novel framework to approximate alignments on process trees by exploiting their hierarchical structure. Process trees are an important process model formalism used by state-of-the-art process mining techniques such as the inductive mining approaches. Our approach exploits structural properties of a given process tree and splits the alignment computation problem into smaller sub-problems. Finally, sub-results are composed to obtain an alignment. Our experiments show that our approach provides a good balance between accuracy and computation time. △ Less

Submitted 5 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

arXiv:2004.08213 [pdf, other]

doi 10.3390/a13110279

Translating Workflow Nets to Process Trees: An Algorithmic Approach

Authors: Sebastiaan J. van Zelst

Abstract: Since their recent introduction, process trees have been frequently used as a process modeling formalism in many process mining algorithms. A process tree is a tree-based model of a process, in which internal vertices represent behavioral control-flow relations and leaves represent process activities. A process tree is easily translated into a sound Workflow net (WF-net), however, the reverse is n… ▽ More Since their recent introduction, process trees have been frequently used as a process modeling formalism in many process mining algorithms. A process tree is a tree-based model of a process, in which internal vertices represent behavioral control-flow relations and leaves represent process activities. A process tree is easily translated into a sound Workflow net (WF-net), however, the reverse is not the case. Yet, an algorithm that translates a WF-net into a process tree is of great interest, e.g., the explicit knowledge of the control-flow hierarchy in a WF-net allows one to more easily reason on its behavior. Hence, in this paper, we present such an algorithm, i.e., it detects whether a WF-net corresponds to a process tree, and, if so, constructs it. We prove that, if a process tree is discovered, the language of the process tree equals the language of the original WF-net. Conducted experiments show, that the algorithm's corresponding implementation has a quadratic time-complexity in the size of the WF-net. Furthermore, the experiments show strong evidence of process tree rediscoverability. △ Less

Submitted 23 March, 2020; originally announced April 2020.

arXiv:2002.05945 [pdf, other]

doi 10.1007/978-3-030-58666-9_9

Online Process Monitoring Using Incremental State-Space Expansion: An Exact Algorithm

Authors: Daniel Schuster, Sebastiaan J. van Zelst

Abstract: The execution of (business) processes generates valuable traces of event data in the information systems employed within companies. Recently, approaches for monitoring the correctness of the execution of running processes have been developed in the area of process mining, i.e., online conformance checking. The advantages of monitoring a process' conformity during its execution are clear, i.e., dev… ▽ More The execution of (business) processes generates valuable traces of event data in the information systems employed within companies. Recently, approaches for monitoring the correctness of the execution of running processes have been developed in the area of process mining, i.e., online conformance checking. The advantages of monitoring a process' conformity during its execution are clear, i.e., deviations are detected as soon as they occur and countermeasures can immediately be initiated to reduce the possible negative effects caused by process deviations. Existing work in online conformance checking only allows for obtaining approximations of non-conformity, e.g., overestimating the actual severity of the deviation. In this paper, we present an exact, parameter-free, online conformance checking algorithm that computes conformance checking results on the fly. Our algorithm exploits the fact that the conformance checking problem can be reduced to a shortest path problem, by incrementally expanding the search space and reusing previously computed intermediate results. Our experiments show that our algorithm outperforms comparable state-of-the-art approximation algorithms. △ Less

Submitted 15 July, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

arXiv:1912.05022 [pdf, other]

Conformance Checking Approximation using Subset Selection and Edit Distance

Authors: Mohammadreza Fani Sani, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Conformance checking techniques let us find out to what degree a process model and real execution data correspond to each other. In recent years, alignments have proven extremely useful in calculating conformance statistics. Most techniques to compute alignments provide an exact solution. However, in many applications, it is enough to have an approximation of the conformance value. Specifically, f… ▽ More Conformance checking techniques let us find out to what degree a process model and real execution data correspond to each other. In recent years, alignments have proven extremely useful in calculating conformance statistics. Most techniques to compute alignments provide an exact solution. However, in many applications, it is enough to have an approximation of the conformance value. Specifically, for large event data, the computing time for alignments is considerably long using current techniques which makes them inapplicable in reality. Also, it is no longer feasible to use standard hardware for complex processes. Hence, we need techniques that enable us to obtain fast, and at the same time, accurate approximation of the conformance values. This paper proposes new approximation techniques to compute approximated conformance checking values close to exact solution values in a faster time. Those methods also provide upper and lower bounds for the approximated alignment value. Our experiments on real event data show that it is possible to improve the performance of conformance checking by using the proposed methods compared to using the state-of-the-art alignment approximation technique. Results show that in most of the cases, we provide tight bounds, accurate approximated alignment values, and similar deviation statistics. △ Less

Submitted 1 December, 2019; originally announced December 2019.

arXiv:1905.06169 [pdf, other]

Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science

Authors: Alessandro Berti, Sebastiaan J. van Zelst, Wil van der Aalst

Abstract: Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold… ▽ More Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. The commercial process mining tools provide limited support for implementing custom algorithms. Moreover, both commercial and open-source process mining tools are often only accessible through a graphical user interface, which hampers their usage in large-scale experimental settings. Initiatives such as RapidProM provide process mining support in the scientific workflow-based data science suite RapidMiner. However, these offer limited to no support for algorithmic customization. In the light of the aforementioned, in this paper, we present a novel process mining library, i.e. Process Mining for Python (PM4Py) that aims to bridge this gap, providing integration with state-of-the-art data science libraries, e.g., pandas, numpy, scipy and scikit-learn. We provide a global overview of the architecture and functionality of PM4Py, accompanied by some representative examples of its usage. △ Less

Submitted 15 May, 2019; originally announced May 2019.

arXiv:1811.00062 [pdf, ps, other]

An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction

Authors: Niek Tax, Irene Teinemaa, Sebastiaan J. van Zelst

Abstract: Data of sequential nature arise in many application domains in forms of, e.g. textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) in the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide-range… ▽ More Data of sequential nature arise in many application domains in forms of, e.g. textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) in the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide-range of tasks, (ii) in process mining process discovery techniques aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal - learning a model that accurately describes the behavior in the underlying data. Those sequence models are generative, i.e, they can predict what elements are likely to occur after a given unfinished sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling techniques on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning techniques that generally have no aim at interpretability in terms of accuracy outperform techniques from the process mining and grammar inference fields that aim to yield interpretable models. △ Less

Submitted 31 October, 2018; originally announced November 2018.

arXiv:1704.08101 [pdf, ps, other]

doi 10.1007/s10115-017-1060-2

Event Stream-Based Process Discovery using Abstract Representations

Authors: Sebastiaan J. van Zelst, Boudewijn F. van Dongen, Wil M. P. van der Aalst

Abstract: The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper we focus on process discovery relying on online streams of bu… ▽ More The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining tool-kit ProM (http://promtools.org). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain. △ Less

Submitted 25 April, 2017; originally announced April 2017.

Comments: Accepted for publication in "Knowledge and Information Systems; " (Springer: http://link.springer.com/journal/10115)

arXiv:1703.06733 [pdf, ps, other]

doi 10.1007/s00607-017-0582-5

Discovering Relaxed Sound Workflow Nets using Integer Linear Programming

Authors: S. J. van Zelst, B. F. van Dongen, W. M. P. van der Aalst, H. M. W. Verbeek

Abstract: Process mining is concerned with the analysis, understanding and improvement of business processes. Process discovery, i.e. discovering a process model based on an event log, is considered the most challenging process mining task. State-of-the-art process discovery algorithms only discover local control-flow patterns and are unable to discover complex, non-local patterns. Region theory based techn… ▽ More Process mining is concerned with the analysis, understanding and improvement of business processes. Process discovery, i.e. discovering a process model based on an event log, is considered the most challenging process mining task. State-of-the-art process discovery algorithms only discover local control-flow patterns and are unable to discover complex, non-local patterns. Region theory based techniques, i.e. an established class of process discovery techniques, do allow for discovering such patterns. However, applying region theory directly results in complex, over-fitting models, which is less desirable. Moreover, region theory does not cope with guarantees provided by state-of-the-art process discovery algorithms, both w.r.t. structural and behavioural properties of the discovered process models. In this paper we present an ILP-based process discovery approach, based on region theory, that guarantees to discover relaxed sound workflow nets. Moreover, we devise a filtering algorithm, based on the internal working of the ILP-formulation, that is able to cope with the presence of infrequent behaviour. We have extensively evaluated the technique using different event logs with different levels of exceptional behaviour. Our experiments show that the presented approach allow us to leverage the inherent shortcomings of existing region-based approaches. The techniques presented are implemented and readily available in the HybridILPMiner package in the open-source process mining tool-kits ProM and RapidProM. △ Less

Submitted 17 March, 2017; originally announced March 2017.

Comments: technical report related to manuscript submitted to computing journal

arXiv:1703.03740 [pdf, other]

RapidProM: Mine Your Processes and Not Just Your Data

Authors: Wil M. P. van der Aalst, Alfredo Bolt, Sebastiaan J. van Zelst

Abstract: The number of events recorded for operational processes is growing every year. This applies to all domains: from health care and e-government to production and maintenance. Event data are a valuable source of information for organizations that need to meet requirements related to compliance, efficiency, and customer service. Process mining helps to turn these data into real value: by discovering t… ▽ More The number of events recorded for operational processes is growing every year. This applies to all domains: from health care and e-government to production and maintenance. Event data are a valuable source of information for organizations that need to meet requirements related to compliance, efficiency, and customer service. Process mining helps to turn these data into real value: by discovering the real processes, by automatically identifying bottlenecks, by analyzing deviations and sources of non-compliance, by revealing the actual behavior of people, etc. Process mining is very different from conventional data mining and machine learning techniques. ProM is a powerful open-source process mining tool supporting hundreds of analysis techniques. However, ProM does not support analysis based on scientific workflows. RapidProM, an extension of RapidMiner based on ProM, combines the best of both worlds. Complex process mining workflows can be modeled and executed easily and subsequently reused for other data sets. Moreover, using RapidProM, one can benefit from combinations of process mining with other types of analysis available through the RapidMiner marketplace. △ Less

Submitted 10 March, 2017; originally announced March 2017.

Comments: Will be published in 2nd version of "RapidMiner: Data Mining Use Cases and Business Analytics Applications"; Markus Hofmann, Ralf Klinkenberg; published by Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Showing 1–20 of 20 results for author: van Zelst, S