Search | arXiv e-print repository

Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text

Authors: Phuong Nam Lê, Charlotte Schneider-Depré, Alexandre Goossens, Alexander Stevens, Aurélie Leribaux, Johannes De Smedt

Abstract: Efficient planning, resource management, and consistent operations often rely on converting textual process documents into formal Business Process Model and Notation (BPMN) models. However, this conversion process remains time-intensive and costly. Existing approaches, whether rule-based or machine-learning-based, still struggle with writing styles and often fail to identify parallel structures in… ▽ More Efficient planning, resource management, and consistent operations often rely on converting textual process documents into formal Business Process Model and Notation (BPMN) models. However, this conversion process remains time-intensive and costly. Existing approaches, whether rule-based or machine-learning-based, still struggle with writing styles and often fail to identify parallel structures in process descriptions. This paper introduces an automated pipeline for extracting BPMN models from text, leveraging the use of machine learning and large language models. A key contribution of this work is the introduction of a newly annotated dataset, which significantly enhances the training process. Specifically, we augment the PET dataset with 15 newly annotated documents containing 32 parallel gateways for model training, a critical feature often overlooked in existing datasets. This addition enables models to better capture parallel structures, a common but complex aspect of process descriptions. The proposed approach demonstrates adequate performance in terms of reconstruction accuracy, offering a promising foundation for organizations to accelerate BPMN model creation. △ Less

Submitted 11 July, 2025; originally announced July 2025.

arXiv:2506.23776 [pdf, ps, other]

Model-driven Stochastic Trace Clustering

Authors: Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

Abstract: Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters… ▽ More Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic conformance metric based on directly-follows probabilities, to guide trace assignment. This allows clustering decisions to consider both structural alignment with a cluster's process model and the likelihood that a trace originates from a given stochastic process model. The method is computationally efficient, scales linearly with input size, and improves model interpretability by producing clusters with clearer control-flow patterns. Extensive experiments on public real-life datasets show that our method outperforms existing alternatives in representing process behavior and reveals how clustering performance rankings can shift when stochasticity is considered. △ Less

Submitted 30 June, 2025; originally announced June 2025.

arXiv:2506.14772 [pdf, ps, other]

SimBank: from Simulation to Solution in Prescriptive Process Monitoring

Authors: Jakob De Moor, Hans Weytjens, Johannes De Smedt, Jochen De Weerdt

Abstract: Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking enhanced operational performance. However, the current literature faces two key limitations: a lack of extensive comparisons between techniques and insufficient e… ▽ More Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking enhanced operational performance. However, the current literature faces two key limitations: a lack of extensive comparisons between techniques and insufficient evaluation approaches. To address these gaps, we introduce SimBank: a simulator designed for accurate benchmarking of PresPM methods. Modeled after a bank's loan application process, SimBank enables extensive comparisons of both online and offline PresPM methods. It incorporates a variety of intervention optimization problems with differing levels of complexity and supports experiments on key causal machine learning challenges, such as assessing a method's robustness to confounding in data. SimBank additionally offers a comprehensive evaluation capability: for each test case, it can generate the true outcome under each intervention action, which is not possible using recorded datasets. The simulator incorporates parallel activities and loops, drawing from common logs to generate cases that closely resemble real-life process instances. Our proof of concept demonstrates SimBank's benchmarking capabilities through experiments with various PresPM methods across different interventions, highlighting its value as a publicly available simulator for advancing research and practice in PresPM. △ Less

Submitted 2 July, 2025; v1 submitted 28 March, 2025; originally announced June 2025.

arXiv:2411.14263 [pdf, other]

Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Authors: Alexander Stevens, Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

Abstract: In predictive process monitoring, predictive models are vulnerable to adversarial attacks, where input perturbations can lead to incorrect predictions. Unlike in computer vision, where these perturbations are designed to be imperceptible to the human eye, the generation of adversarial examples in predictive process monitoring poses unique challenges. Minor changes to the activity sequences can cre… ▽ More In predictive process monitoring, predictive models are vulnerable to adversarial attacks, where input perturbations can lead to incorrect predictions. Unlike in computer vision, where these perturbations are designed to be imperceptible to the human eye, the generation of adversarial examples in predictive process monitoring poses unique challenges. Minor changes to the activity sequences can create improbable or even impossible scenarios to occur due to underlying constraints such as regulatory rules or process constraints. To address this, we focus on generating realistic adversarial examples tailored to the business process context, in contrast to the imperceptible, pixel-level changes commonly seen in computer vision adversarial attacks. This paper introduces two novel latent space attacks, which generate adversaries by adding noise to the latent space representation of the input data, rather than directly modifying the input attributes. These latent space methods are domain-agnostic and do not rely on process-specific knowledge, as we restrict the generation of adversarial examples to the learned class-specific data distributions by directly perturbing the latent space representation of the business process executions. We evaluate these two latent space methods with six other adversarial attacking methods on eleven real-life event logs and four predictive models. The first three attacking methods directly permute the activities of the historically observed business process executions. The fourth method constrains the adversarial examples to lie within the same data distribution as the original instances, by projecting the adversarial examples to the original data distribution. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2410.00596 [pdf, other]

Dynamic and Scalable Data Preparation for Object-Centric Process Mining

Authors: Lien Bosmans, Jari Peeperkorn, Alexandre Goossens, Giovanni Lugaresi, Johannes De Smedt, Jochen De Weerdt

Abstract: Object-centric process mining is emerging as a promising paradigm across diverse industries, drawing substantial academic attention. To support its data requirements, existing object-centric data formats primarily facilitate the exchange of static event logs between data owners, researchers, and analysts, rather than serving as a robust foundational data model for continuous data ingestion and tra… ▽ More Object-centric process mining is emerging as a promising paradigm across diverse industries, drawing substantial academic attention. To support its data requirements, existing object-centric data formats primarily facilitate the exchange of static event logs between data owners, researchers, and analysts, rather than serving as a robust foundational data model for continuous data ingestion and transformation pipelines for subsequent storage and analysis. This focus results into suboptimal design choices in terms of flexibility, scalability, and maintainability. For example, it is difficult for current object-centric event log formats to deal with novel object types or new attributes in case of streaming data. This paper proposes a database format designed for an intermediate data storage hub, which segregates process mining applications from their data sources using a hub-and-spoke architecture. It delineates essential requirements for robust object-centric event log storage from a data engineering perspective and introduces a novel relational schema tailored to these requirements. To validate the efficacy of the proposed database format, an end-to-end solution is implemented using a lightweight, open-source data stack. Our implementation includes data extractors for various object-centric event log formats, automated data quality assessments, and intuitive process data visualization capabilities. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2405.12709 [pdf, ps, other]

Object-Centric Event Logs: Specifications, Comparative Analysis and Refinement

Authors: Alexandre Goossens, Johannes De Smedt, Jan Vanthienen

Abstract: Process mining aims to comprehend and enhance business processes by analyzing event logs. Recently, object-centric process mining has gained traction by considering multiple objects interacting with each other in a process. This object-centric approach offers advantages over traditional methods by avoiding dimension reduction issues. However, in contrast to traditional process mining where a stand… ▽ More Process mining aims to comprehend and enhance business processes by analyzing event logs. Recently, object-centric process mining has gained traction by considering multiple objects interacting with each other in a process. This object-centric approach offers advantages over traditional methods by avoiding dimension reduction issues. However, in contrast to traditional process mining where a standard event log format was quickly agreed upon with XES providing a common platform for further research and industry, various object-centric logging formats have been proposed, each addressing specific challenges such as object relations or dynamic attribute changes. This makes that interoperability of object-centric algorithms remains a challenge, hindering reproducibility and generalizability in research. Additionally, the object-centric process storage paradigm aligns well with a wide range of object-oriented databases storing process data. This paper introduces a specifications framework from three perspectives originating from process mining (what should be analyzed), object-centric process modeling (how it should be modeled), and database storage (how it should be stored) perspectives in order to compare and evaluate object-centric log formats. By identifying commonalities and discrepancies among these formats, the study delves into unresolved issues and proposes potential solutions. Ultimately, this research contributes to advancing object-centric process mining by facilitating a deeper understanding of event log formats and promoting consistency and compatibility across methodologies. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2403.09232 [pdf, other]

Generating Feasible and Plausible Counterfactual Explanations for Outcome Prediction of Business Processes

Authors: Alexander Stevens, Chun Ouyang, Johannes De Smedt, Catarina Moreira

Abstract: In recent years, various machine and deep learning architectures have been successfully introduced to the field of predictive process analytics. Nevertheless, the inherent opacity of these algorithms poses a significant challenge for human decision-makers, hindering their ability to understand the reasoning behind the predictions. This growing concern has sparked the introduction of counterfactual… ▽ More In recent years, various machine and deep learning architectures have been successfully introduced to the field of predictive process analytics. Nevertheless, the inherent opacity of these algorithms poses a significant challenge for human decision-makers, hindering their ability to understand the reasoning behind the predictions. This growing concern has sparked the introduction of counterfactual explanations, designed as human-understandable what if scenarios, to provide clearer insights into the decision-making process behind undesirable predictions. The generation of counterfactual explanations, however, encounters specific challenges when dealing with the sequential nature of the (business) process cases typically used in predictive process analytics. Our paper tackles this challenge by introducing a data-driven approach, REVISEDplus, to generate more feasible and plausible counterfactual explanations. First, we restrict the counterfactual algorithm to generate counterfactuals that lie within a high-density region of the process data, ensuring that the proposed counterfactuals are realistic and feasible within the observed process data distribution. Additionally, we ensure plausibility by learning sequential patterns between the activities in the process cases, utilising Declare language templates. Finally, we evaluate the properties that define the validity of counterfactuals. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Journal Submission

arXiv:2401.14847 [pdf, other]

Extracting Process-Aware Decision Models from Object-Centric Process Data

Authors: Alexandre Goossens, Johannes De Smedt, Jan Vanthienen

Abstract: Organizations execute decisions within business processes on a daily basis whilst having to take into account multiple stakeholders who might require multiple point of views of the same process. Moreover, the complexity of the information systems running these business processes is generally high as they are linked to databases storing all the relevant data and aspects of the processes. Given the… ▽ More Organizations execute decisions within business processes on a daily basis whilst having to take into account multiple stakeholders who might require multiple point of views of the same process. Moreover, the complexity of the information systems running these business processes is generally high as they are linked to databases storing all the relevant data and aspects of the processes. Given the presence of multiple objects within an information system which support the processes in their enactment, decisions are naturally influenced by both these perspectives, logged in object-centric process logs. However, the discovery of such decisions from object-centric process logs is not straightforward as it requires to correctly link the involved objects whilst considering the sequential constraints that business processes impose as well as correctly discovering what a decision actually does. This paper proposes the first object-centric decision-mining algorithm called Integrated Object-centric Decision Discovery Algorithm (IODDA). IODDA is able to discover how a decision is structured as well as how a decision is made. Moreover, IODDA is able to discover which activities and object types are involved in the decision-making process. Next, IODDA is demonstrated with the first artificial knowledge-intensive process logs whose log generators are provided to the research community. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2311.05986 [pdf, other]

Signature-Based Community Detection for Time Series

Authors: Marco Gregnanin, Johannes De Smedt, Giorgio Gnecco, Maurizio Parton

Abstract: Community detection for time series without prior knowledge poses an open challenge within complex networks theory. Traditional approaches begin by assessing time series correlations and maximizing modularity under diverse null models. These methods suffer from assuming temporal stationarity and are influenced by the granularity of observation intervals. In this study, we propose an approach based… ▽ More Community detection for time series without prior knowledge poses an open challenge within complex networks theory. Traditional approaches begin by assessing time series correlations and maximizing modularity under diverse null models. These methods suffer from assuming temporal stationarity and are influenced by the granularity of observation intervals. In this study, we propose an approach based on the signature matrix, a concept from path theory for studying stochastic processes. By employing a signature-derived similarity measure, our method overcomes drawbacks of traditional correlation-based techniques. Through a series of numerical experiments, we demonstrate that our method consistently yields higher modularity compared to baseline models, when tested on the Standard and Poor's 500 dataset. Moreover, our approach showcases enhanced stability in modularity when the length of the underlying time series is manipulated. This research contributes to the field of community detection by introducing a signature-based similarity measure, offering an alternative to conventional correlation matrices. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2309.14092 [pdf, other]

From OCEL to DOCEL -- Datasets and Automated Transformation

Authors: Alexandre Goossens, Adrian Rebmann, Johannes De Smedt, Jan Vanthienen, Han van der Aa

Abstract: Object-centric event data represent processes from the point of view of all the involved object types. This perspective has gained interest in recent years as it supports the analysis of processes that previously could not be adequately captured, due to the lack of a clear case notion as well as an increasing amount of output data that needs to be stored. Although publicly available event logs are… ▽ More Object-centric event data represent processes from the point of view of all the involved object types. This perspective has gained interest in recent years as it supports the analysis of processes that previously could not be adequately captured, due to the lack of a clear case notion as well as an increasing amount of output data that needs to be stored. Although publicly available event logs are crucial artifacts for researchers to develop and evaluate novel process mining techniques, the currently available object-centric event logs have limitations in this regard. Specifically, they mainly focus on control-flow and rarely contain objects with attributes that change over time, even though this is not realistic, as the attribute values of objects can be altered during their lifecycle. This paper addresses this gap by providing two means of establishing object-centric datasets with dynamically evolving attributes. First, we provide event log generators, which allow researchers to generate customized, artificial logs with dynamic attributes in the recently proposed DOCEL format. Second, we propose and evaluate an algorithm to convert OCEL logs into DOCEL logs, which involves the detection of event attributes that capture evolving object information and the creation of dynamic attributes from these. Through these contributions, this paper supports the advancement of object-centric process analysis by providing researchers with new means to obtain relevant data to use during the development of new techniques. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2212.02858 [pdf, other]

Enhancing Data-Awareness of Object-Centric Event Logs

Authors: Alexandre Goossens, Johannes De Smedt, Jan Vanthienen, Wil van der Aalst

Abstract: When multiple objects are involved in a process, there is an opportunity for processes to be discovered from different angles with new information that previously might not have been analyzed from a single object point of view. This does require that all the information of event/object attributes and their values are stored within logs including attributes that have a list of values or attributes… ▽ More When multiple objects are involved in a process, there is an opportunity for processes to be discovered from different angles with new information that previously might not have been analyzed from a single object point of view. This does require that all the information of event/object attributes and their values are stored within logs including attributes that have a list of values or attributes with values that change over time. It also requires that attributes can unambiguously be linked to an object, an event or both. As such, object-centric event logs are an interesting development in process mining as they support the presence of multiple types of objects. First, this paper shows that the current object-centric event log formats do not support the aforementioned aspects to their full potential since the possibility to support dynamic object attributes (attributes with changing values) is not supported by existing formats. Next, this paper introduces a novel enriched object-centric event log format tackling the aforementioned issues alongside an algorithm that automatically translates XES logs to this Data-aware OCEL (DOCEL) format. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Submitted and Accepted at the 4th International Conference on Process Mining (ICPM) 2022: Event Data and Behavioral Analytics (EdbA) Workshop

arXiv:2208.07749 [pdf, other]

doi 10.1016/j.eswa.2022.118182

Predicting student performance using sequence classification with time-based windows

Authors: Galina Deeva, Johannes De Smedt, Cecilia Saint-Pierre, Richard Weber, Jochen De Weerdt

Abstract: A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learning experience and widening their educational pros… ▽ More A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learning experience and widening their educational prospects, but also an opportunity to gain insights into students' learning processes with learning analytics. This study contributes to the topic of improving and understanding e-learning processes in the following ways. First, we demonstrate that accurate predictive models can be built based on sequential patterns derived from students' behavioral data, which are able to identify underperforming students early in the course. Second, we investigate the specificity-generalizability trade-off in building such predictive models by investigating whether predictive models should be built for every course individually based on course-specific sequential patterns, or across several courses based on more general behavioral patterns. Finally, we present a methodology for capturing temporal aspects in behavioral data and analyze its influence on the predictive performance of the models. The results of our improved sequence classification technique are capable to predict student performance with high levels of accuracy, reaching 90 percent for course-specific models. △ Less

Submitted 1 September, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Journal ref: Expert Systems with Applications, 118182 (2022)

arXiv:2203.16073 [pdf, other]

Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models

Authors: Alexander Stevens, Johannes De Smedt

Abstract: Although a recent shift has been made in the field of predictive process monitoring to use models from the explainable artificial intelligence field, the evaluation still occurs mainly through performance-based metrics, thus not accounting for the actionability and implications of the explanations. In this paper, we define explainability through the interpretability of the explanations and the fai… ▽ More Although a recent shift has been made in the field of predictive process monitoring to use models from the explainable artificial intelligence field, the evaluation still occurs mainly through performance-based metrics, thus not accounting for the actionability and implications of the explanations. In this paper, we define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. The introduced properties are analysed along the event, case, and control flow perspective which are typical for a process-based analysis. This allows comparing inherently created explanations with post-hoc explanations. We benchmark seven classifiers on thirteen real-life events logs, and these cover a range of transparent and non-transparent machine learning and deep learning models, further complemented with explainability techniques. Next, this paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications, by providing insight into how the varying preprocessing, model complexity and explainability techniques typical in process outcome prediction influence the explainability of the model. △ Less

Submitted 30 July, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2105.01092 [pdf, other]

Process Model Forecasting Using Time Series Analysis of Event Sequence Data

Authors: Johannes De Smedt, Anton Yeshchenko, Artem Polyvyanyy, Jochen De Weerdt, Jan Mendling

Abstract: Process analytics is an umbrella of data-driven techniques which includes making predictions for individual process instances or overall process models. At the instance level, various novel techniques have been recently devised, tackling next activity, remaining time, and outcome prediction. At the model level, there is a notable void. It is the ambition of this paper to fill this gap. To this end… ▽ More Process analytics is an umbrella of data-driven techniques which includes making predictions for individual process instances or overall process models. At the instance level, various novel techniques have been recently devised, tackling next activity, remaining time, and outcome prediction. At the model level, there is a notable void. It is the ambition of this paper to fill this gap. To this end, we develop a technique to forecast the entire process model from historical event data. A forecasted model is a will-be process model representing a probable future state of the overall process. Such a forecast helps to investigate the consequences of drift and emerging bottlenecks. Our technique builds on a representation of event data as multiple time series, each capturing the evolution of a behavioural aspect of the process model, such that corresponding forecasting techniques can be applied. Our implementation demonstrates the accuracy of our technique on real-world event log data. △ Less

Submitted 28 July, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: Accepted at the International Conference on Conceptual Modeling 2021

arXiv:2011.11551 [pdf, other]

doi 10.1016/j.is.2020.101685

Conformance Checking of Mixed-paradigm Process Models

Authors: Boudewijn van Dongen, Johannes De Smedt, Claudio Di Ciccio, Jan Mendling

Abstract: Mixed-paradigm process models integrate strengths of procedural and declarative representations like Petri nets and Declare. They are specifically interesting for process mining because they allow capturing complex behaviour in a compact way. A key research challenge for the proliferation of mixed-paradigm models for process mining is the lack of corresponding conformance checking techniques. In t… ▽ More Mixed-paradigm process models integrate strengths of procedural and declarative representations like Petri nets and Declare. They are specifically interesting for process mining because they allow capturing complex behaviour in a compact way. A key research challenge for the proliferation of mixed-paradigm models for process mining is the lack of corresponding conformance checking techniques. In this paper, we address this problem by devising the first approach that works with intertwined state spaces of mixed-paradigm models. More specifically, our approach uses an alignment-based replay to explore the state space and compute trace fitness in a procedural way. In every state, the declarative constraints are separately updated, such that violations disable the corresponding activities. Our technique provides for an efficient replay towards an optimal alignment by respecting all orthogonal Declare constraints. We have implemented our technique in ProM and demonstrate its performance in an evaluation with real-world event logs. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted for publication in Information Systems

arXiv:2011.02819 [pdf, other]

Predictive Process Model Monitoring using Recurrent Neural Networks

Authors: Johannes De Smedt, Jochen De Weerdt

Abstract: The field of predictive process monitoring focuses on case-level models to predict a single specific outcome such as a particular objective, (remaining) time, or next activity/remaining sequence. Recently, a longer-horizon, model-wide approach has been proposed in the form of process model forecasting, which predicts the future state of a whole process model through the forecasting of all activity… ▽ More The field of predictive process monitoring focuses on case-level models to predict a single specific outcome such as a particular objective, (remaining) time, or next activity/remaining sequence. Recently, a longer-horizon, model-wide approach has been proposed in the form of process model forecasting, which predicts the future state of a whole process model through the forecasting of all activity-to-activity relations at once using time series forecasting. This paper introduces the concept of \emph{predictive process model monitoring} which sits in the middle of both predictive process monitoring and process model forecasting. Concretely, by modelling a process model as a set of constraints being present between activities over time, we can capture more detailed information between activities compared to process model forecasting, while being compatible with typical predictive process monitoring objectives which are often expressed in the same language as these constraints. To achieve this, Processes-As-Movies (PAM) is introduced, i.e., a novel technique capable of jointly mining and predicting declarative process constraints between activities in various windows of a process' execution. PAM predicts what declarative rules hold for a trace (objective-based), which also supports the prediction of all constraints together as a process model (model-based). Various recurrent neural network topologies inspired by video analysis tailored to temporal high-dimensional input are used to model the process model evolution with windows as time steps, including encoder-decoder long short-term memory networks, and convolutional long short-term memory networks. Results obtained over real-life event logs show that these topologies are effective in terms of predictive accuracy and precision. △ Less

Submitted 10 January, 2023; v1 submitted 5 November, 2020; originally announced November 2020.

Showing 1–16 of 16 results for author: De Smedt, J