Search | arXiv e-print repository

Artificial Conversations, Real Results: Fostering Language Detection with Synthetic Data

Authors: Fatemeh Mohammadi, Tommaso Romano, Samira Maghool, Paolo Ceravolo

Abstract: Collecting high-quality training data is essential for fine-tuning Large Language Models (LLMs). However, acquiring such data is often costly and time-consuming, especially for non-English languages such as Italian. Recently, researchers have begun to explore the use of LLMs to generate synthetic datasets as a viable alternative. This study proposes a pipeline for generating synthetic data and a c… ▽ More Collecting high-quality training data is essential for fine-tuning Large Language Models (LLMs). However, acquiring such data is often costly and time-consuming, especially for non-English languages such as Italian. Recently, researchers have begun to explore the use of LLMs to generate synthetic datasets as a viable alternative. This study proposes a pipeline for generating synthetic data and a comprehensive approach for investigating the factors that influence the validity of synthetic data generated by LLMs by examining how model performance is affected by metrics such as prompt strategy, text length and target position in a specific task, i.e. inclusive language detection in Italian job advertisements. Our results show that, in most cases and across different metrics, the fine-tuned models trained on synthetic data consistently outperformed other models on both real and synthetic test datasets. The study discusses the practical implications and limitations of using synthetic data for language detection tasks with LLMs. △ Less

Submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.18994 [pdf, other]

HH4AI: A methodological Framework for AI Human Rights impact assessment under the EUAI ACT

Authors: Paolo Ceravolo, Ernesto Damiani, Maria Elisa D'Amico, Bianca de Teffe Erb, Simone Favaro, Nannerel Fiano, Paolo Gambatesa, Simone La Porta, Samira Maghool, Lara Mauri, Niccolo Panigada, Lorenzo Maria Ratto Vaquer, Marta A. Tamborini

Abstract: This paper introduces the HH4AI Methodology, a structured approach to assessing the impact of AI systems on human rights, focusing on compliance with the EU AI Act and addressing technical, ethical, and regulatory challenges. The paper highlights AIs transformative nature, driven by autonomy, data, and goal-oriented design, and how the EU AI Act promotes transparency, accountability, and safety. A… ▽ More This paper introduces the HH4AI Methodology, a structured approach to assessing the impact of AI systems on human rights, focusing on compliance with the EU AI Act and addressing technical, ethical, and regulatory challenges. The paper highlights AIs transformative nature, driven by autonomy, data, and goal-oriented design, and how the EU AI Act promotes transparency, accountability, and safety. A key challenge is defining and assessing "high-risk" AI systems across industries, complicated by the lack of universally accepted standards and AIs rapid evolution. To address these challenges, the paper explores the relevance of ISO/IEC and IEEE standards, focusing on risk management, data quality, bias mitigation, and governance. It proposes a Fundamental Rights Impact Assessment (FRIA) methodology, a gate-based framework designed to isolate and assess risks through phases including an AI system overview, a human rights checklist, an impact assessment, and a final output phase. A filtering mechanism tailors the assessment to the system's characteristics, targeting areas like accountability, AI literacy, data governance, and transparency. The paper illustrates the FRIA methodology through a fictional case study of an automated healthcare triage service. The structured approach enables systematic filtering, comprehensive risk assessment, and mitigation planning, effectively prioritizing critical risks and providing clear remediation strategies. This promotes better alignment with human rights principles and enhances regulatory compliance. △ Less

Submitted 23 March, 2025; originally announced March 2025.

Comments: 19 pages, 7 figures, 1 table

arXiv:2502.11611 [pdf, other]

Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian using Similarity Networks

Authors: Fatemeh Mohammadi, Marta Annamaria Tamborini, Paolo Ceravolo, Costanza Nardocci, Samira Maghool

Abstract: This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation. Our approach focuses on identifying gender bias in English-to-Italian translations. First, we define gender bias foll… ▽ More This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation. Our approach focuses on identifying gender bias in English-to-Italian translations. First, we define gender bias following human rights law and linguistics literature. Then we proceed by identifying gender-specific terms such as she/lei and he/lui as key elements. We then evaluate the cosine similarity between these target terms and others in the dataset to reveal the model's perception of semantic relations. Using numerical features, we effectively evaluate the intensity and direction of the bias. Our findings provide tangible insights for developing and training gender-neutral translation algorithms. △ Less

Submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.06918 [pdf, other]

Leveraging GPT-4o Efficiency for Detecting Rework Anomaly in Business Processes

Authors: Mohammad Derakhshan, Paolo Ceravolo, Fatemeh Mohammadi

Abstract: This paper investigates the effectiveness of GPT-4o-2024-08-06, one of the Large Language Models (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business event logs. The analysis was performed on a synthe… ▽ More This paper investigates the effectiveness of GPT-4o-2024-08-06, one of the Large Language Models (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business event logs. The analysis was performed on a synthetic dataset designed to contain rework anomalies but free of loops. To evaluate the anomaly detection capabilities of GPT 4o-2024-08-06, we used three prompting techniques: zero-shot, one-shot, and few-shot. These techniques were tested on different anomaly distributions, namely normal, uniform, and exponential, to identify the most effective approach for each case. The results demonstrate the strong performance of GPT-4o-2024-08-06. On our dataset, the model achieved 96.14% accuracy with one-shot prompting for the normal distribution, 97.94% accuracy with few-shot prompting for the uniform distribution, and 74.21% accuracy with few-shot prompting for the exponential distribution. These results highlight the model's potential as a reliable tool for detecting rework anomalies in event logs and how anomaly distribution and prompting strategy influence the model's performance. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: 14 pages, 5 images, 4 tables

arXiv:2411.05648 [pdf, other]

Enhancing Model Fairness and Accuracy with Similarity Networks: A Methodological Approach

Authors: Samira Maghool, Paolo Ceravolo

Abstract: In this paper, we propose an innovative approach to thoroughly explore dataset features that introduce bias in downstream machine-learning tasks. Depending on the data format, we use different techniques to map instances into a similarity feature space. Our method's ability to adjust the resolution of pairwise similarity provides clear insights into the relationship between the dataset classificat… ▽ More In this paper, we propose an innovative approach to thoroughly explore dataset features that introduce bias in downstream machine-learning tasks. Depending on the data format, we use different techniques to map instances into a similarity feature space. Our method's ability to adjust the resolution of pairwise similarity provides clear insights into the relationship between the dataset classification complexity and model fairness. Experimental results confirm the promising applicability of the similarity network in promoting fair models. Moreover, leveraging our methodology not only seems promising in providing a fair downstream task such as classification, it also performs well in imputation and augmentation of the dataset satisfying the fairness criteria such as demographic parity and imbalanced classes. △ Less

Submitted 8 November, 2024; originally announced November 2024.

Comments: 7 pages, 4 figures

arXiv:2406.06596 [pdf, other]

Are Large Language Models the New Interface for Data Pipelines?

Authors: Sylvio Barbon Junior, Paolo Ceravolo, Sven Groppe, Mustafa Jarrar, Samira Maghool, Florence Sèdes, Soror Sahri, Maurice Van Keulen

Abstract: A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. The capabilities of LLMs in natural language underst… ▽ More A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. The capabilities of LLMs in natural language understanding and generation, combined with their scalability, versatility, and state-of-the-art performance, enable innovative applications across various AI-related fields, including eXplainable Artificial Intelligence (XAI), Automated Machine Learning (AutoML), and Knowledge Graphs (KG). Furthermore, we believe these models can extract valuable insights and make data-driven decisions at scale, a practice commonly referred to as Big Data Analytics (BDA). In this position paper, we provide some discussions in the direction of unlocking synergies among these technologies, which can lead to more powerful and intelligent AI solutions, driving improvements in data pipelines across a wide range of applications and domains integrating humans, computers, and knowledge. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2403.05556 [pdf]

Modeling and predicting students' engagement behaviors using mixture Markov models

Authors: R. Maqsood, P. Ceravolo, C. Romero, S. Ventura

Abstract: Students' engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students' varied (dis)engagement behaviors. In this paper, we utilized… ▽ More Students' engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students' varied (dis)engagement behaviors. In this paper, we utilized model-based clustering for this purpose which generates K mixture Markov models to group students' traces containing their (dis)engagement behavioral patterns. To prevent the Expectation-Maximization (EM) algorithm from getting stuck in a local maxima, we also introduced a K-means-based initialization method named as K-EM. We performed an experimental work on two real datasets using the three variants of the EM algorithm: the original EM, emEM, K-EM; and, non-mixture baseline models for both datasets. The proposed K-EM has shown very promising results and achieved significant performance difference in comparison with the other approaches particularly using the Dataset. Hence, we suggest to perform further experiments using large dataset(s) to validate our method. Additionally, visualization of the resultant clusters through first-order Markov chains reveals very useful insights about (dis)engagement behaviors depicted by the students. We conclude the paper with a discussion on the usefulness of our approach, limitations and potential extensions of this work. △ Less

Submitted 10 February, 2024; originally announced March 2024.

Journal ref: Knowledge and Information System (2022); 64:1349-1384

arXiv:2310.11196 [pdf]

doi 10.1093/mnras/stad3093

Kilometer-precise (UII) Umbriel physical properties from the multichord stellar occultation on 2020 September 21

Authors: M. Assafin, S. Santos-Filho, B. E. Morgado, A. R. Gomes-Júnior, B. Sicardy, G. Margoti, G. Benedetti-Rossi, F. Braga-Ribas, T. Laidler, J. I. B. Camargo, R. Vieira-Martins, T. Swift, D. Dunham, T. George, J. Bardecker, C. Anderson, R. Nolthenius, K. Bender, G. Viscome, D. Oesper, R. Dunford, K. Getrost, C. Kitting, K. Green, R. Bria , et al. (17 additional authors not shown)

Abstract: We report the results of the stellar occultation by (UII) Umbriel on September 21st, 2020. The shadow crossed the USA and Canada, and 19 positive chords were obtained. A limb parameter accounted for putative topographic features in the limb fittings. Ellipse fittings were not robust - only upper limits were derived for the true size/shape of a putative Umbriel ellipsoid. The adopted spherical solu… ▽ More We report the results of the stellar occultation by (UII) Umbriel on September 21st, 2020. The shadow crossed the USA and Canada, and 19 positive chords were obtained. A limb parameter accounted for putative topographic features in the limb fittings. Ellipse fittings were not robust - only upper limits were derived for the true size/shape of a putative Umbriel ellipsoid. The adopted spherical solution gives radius = 582.4 +/- 0.8 km, smaller/close to 584.7 +/- 2.8 km from Voyager II. The apparent ellipse fit results in a true semi-major axis of 584.9 +/- 3.8 km, semi-minor axes of 582.3 +/- 0.6 km and true oblateness of 0.004 +/- 0.008 for a putative ellipsoid. The geometric albedo was pV = 0.26 +/- 0.01. The density was rho = 1.54 +/- 0.04 g cm-3. The surface gravity was 0.251 +/- 0.006 m s-2 and the escape velocity 0.541 +/- 0.006 km s-1 . Upper limits of 13 and 72 nbar (at 1 sigma and 3 sigma levels, respectively) were obtained for the surface pressure of a putative isothermal CO2 atmosphere at T = 70 K. A milliarcsecond precision position was derived: RA = 02h 30m 28.84556s +/- 0.1 mas, DE = 14o 19' 36.5836" +/- 0.2 mas. A large limb parameter of 4.2 km was obtained, in striking agreement with opposite southern hemisphere measurements by Voyager II in 1986. Occultation and Voyager results indicate that the same strong topography variation in the surface of Umbriel is present on both hemispheres. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.08995 [pdf, ps, other]

doi 10.1051/0004-6361/202346191

Scaling slowly rotating asteroids by stellar occultations

Authors: A. Marciniak, J. Ďurech, A. Choukroun, J. Hanuš, W. Ogłoza, R. Szakáts, L. Molnár, A. Pál, F. Monteiro, E. Frappa, W. Beisker, H. Pavlov, J. Moore, R. Adomavičienė, R. Aikawa, S. Andersson, P. Antonini, Y. Argentin, A. Asai, P. Assoignon, J. Barton, P. Baruffetti, K. L. Bath, R. Behrend, L. Benedyktowicz , et al. (154 additional authors not shown)

Abstract: As evidenced by recent survey results, majority of asteroids are slow rotators (P>12 h), but lack spin and shape models due to selection bias. This bias is skewing our overall understanding of the spins, shapes, and sizes of asteroids, as well as of their other properties. Also, diameter determinations for large (>60km) and medium-sized asteroids (between 30 and 60 km) often vary by over 30% for m… ▽ More As evidenced by recent survey results, majority of asteroids are slow rotators (P>12 h), but lack spin and shape models due to selection bias. This bias is skewing our overall understanding of the spins, shapes, and sizes of asteroids, as well as of their other properties. Also, diameter determinations for large (>60km) and medium-sized asteroids (between 30 and 60 km) often vary by over 30% for multiple reasons. Our long-term project is focused on a few tens of slow rotators with periods of up to 60 hours. We aim to obtain their full light curves and reconstruct their spins and shapes. We also precisely scale the models, typically with an accuracy of a few percent. We used wide sets of dense light curves for spin and shape reconstructions via light-curve inversion. Precisely scaling them with thermal data was not possible here because of poor infrared data: large bodies are too bright for WISE mission. Therefore, we recently launched a campaign among stellar occultation observers, to scale these models and to verify the shape solutions, often allowing us to break the mirror pole ambiguity. The presented scheme resulted in shape models for 16 slow rotators, most of them for the first time. Fitting them to stellar occultations resolved previous inconsistencies in size determinations. For around half of the targets, this fitting also allowed us to identify a clearly preferred pole solution, thus removing the ambiguity inherent to light-curve inversion. We also address the influence of the uncertainty of the shape models on the derived diameters. Overall, our project has already provided reliable models for around 50 slow rotators. Such well-determined and scaled asteroid shapes will, e.g. constitute a solid basis for density determinations when coupled with mass information. Spin and shape models continue to fill the gaps caused by various biases. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted to Astronomy & Astrophysics. 12 pages + appendices

Journal ref: A&A 679, A60 (2023)

arXiv:2308.08062 [pdf, other]

doi 10.1051/0004-6361/202346892

A large topographic feature on the surface of the trans-Neptunian object (307261) 2002 MS$_4$ measured from stellar occultations

Authors: F. L. Rommel, F. Braga-Ribas, J. L. Ortiz, B. Sicardy, P. Santos-Sanz, J. Desmars, J. I. B. Camargo, R. Vieira-Martins, M. Assafin, B. E. Morgado, R. C. Boufleur, G. Benedetti-Rossi, A. R. Gomes-Júnior, E. Fernández-Valenzuela, B. J. Holler, D. Souami, R. Duffard, G. Margoti, M. Vara-Lubiano, J. Lecacheux, J. L. Plouvier, N. Morales, A. Maury, J. Fabrega, P. Ceravolo , et al. (179 additional authors not shown)

Abstract: This work aims at constraining the size, shape, and geometric albedo of the dwarf planet candidate 2002 MS4 through the analysis of nine stellar occultation events. Using multichord detection, we also studied the object's topography by analyzing the obtained limb and the residuals between observed chords and the best-fitted ellipse. We predicted and organized the observational campaigns of nine st… ▽ More This work aims at constraining the size, shape, and geometric albedo of the dwarf planet candidate 2002 MS4 through the analysis of nine stellar occultation events. Using multichord detection, we also studied the object's topography by analyzing the obtained limb and the residuals between observed chords and the best-fitted ellipse. We predicted and organized the observational campaigns of nine stellar occultations by 2002 MS4 between 2019 and 2022, resulting in two single-chord events, four double-chord detections, and three events with three to up to sixty-one positive chords. Using 13 selected chords from the 8 August 2020 event, we determined the global elliptical limb of 2002 MS4. The best-fitted ellipse, combined with the object's rotational information from the literature, constrains the object's size, shape, and albedo. Additionally, we developed a new method to characterize topography features on the object's limb. The global limb has a semi-major axis of 412 $\pm$ 10 km, a semi-minor axis of 385 $\pm$ 17 km, and the position angle of the minor axis is 121 $^\circ$ $\pm$ 16$^\circ$. From this instantaneous limb, we obtained 2002 MS4's geometric albedo and the projected area-equivalent diameter. Significant deviations from the fitted ellipse in the northernmost limb are detected from multiple sites highlighting three distinct topographic features: one 11 km depth depression followed by a 25$^{+4}_{-5}$ km height elevation next to a crater-like depression with an extension of 322 $\pm$ 39 km and 45.1 $\pm$ 1.5 km deep. Our results present an object that is $\approx$138 km smaller in diameter than derived from thermal data, possibly indicating the presence of a so-far unknown satellite. However, within the error bars, the geometric albedo in the V-band agrees with the results published in the literature, even with the radiometric-derived albedo. △ Less

Submitted 23 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Journal ref: A&A 678, A167 (2023)

arXiv:2306.10341 [pdf, other]

doi 10.1109/ACCESS.2024.3361650

Tailoring Machine Learning for Process Mining

Authors: Paolo Ceravolo, Sylvio Barbon Junior, Ernesto Damiani, Wil van der Aalst

Abstract: Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observe… ▽ More Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, the learning procedure they follow ignores the constraints concurrency imposes to process data. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements and stimulating the research to elaborate in this direction. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: 16 pages

MSC Class: 68 ACM Class: I.2.6

arXiv:2303.17879 [pdf, other]

CoSMo: a Framework to Instantiate Conditioned Process Simulation Models

Authors: Rafael S. Oyamada, Gabriel M. Tavares, Sylvio Barbon Junior, Paolo Ceravolo

Abstract: Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven appr… ▽ More Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven approaches offer increased interpretability, while deep learning-based excel at generalizing changes across large event logs. However, the practical application of deep learning faces challenges related to managing stochasticity and integrating information for what-if analysis. This paper introduces a novel recurrent neural architecture tailored to discover COnditioned process Simulation MOdels (CoSMo) based on user-based constraints or any other nature of a-priori knowledge. This architecture facilitates the simulation of event logs that adhere to specific constraints by incorporating declarative-based rules into the learning phase as an attempt to fill the gap of incorporating information into deep learning models to perform what-if analysis. Experimental validation illustrates CoSMo's efficacy in simulating event logs while adhering to predefined declarative conditions, emphasizing both control-flow and data-flow perspectives. △ Less

Submitted 25 June, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2301.02167 [pdf, other]

Trace Encoding in Process Mining: a survey and benchmarking

Authors: Sylvio Barbon Jr., Paolo Ceravolo, Rafael S. Oyamada, Gabriel M. Tavares

Abstract: Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specif… ▽ More Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specific expert knowledge domain. Moreover, existing methods are employed by using their default hyperparameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2109.00635 [pdf, other]

Selecting Optimal Trace Clustering Pipelines with AutoML

Authors: Sylvio Barbon Jr, Paolo Ceravolo, Ernesto Damiani, Gabriel Marques Tavares

Abstract: Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work, we propose an Automatic Machine Learning (AutoML)… ▽ More Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work, we propose an Automatic Machine Learning (AutoML) framework to recommend the most suitable pipeline for trace clustering given an event log, which encompasses the encoding method, clustering algorithm, and its hyperparameters. Our experiments were conducted using a thousand event logs, four encoding techniques, and three clustering methods. Results indicate that our framework sheds light on the trace clustering problem and can assist users in choosing the best pipeline considering their scenario. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: 17 pages, 7 figures

arXiv:2103.12874 [pdf, other]

Using Meta-learning to Recommend Process Discovery Methods

Authors: Sylvio Barbon Jr, Paolo Ceravolo, Ernesto Damiani, Gabriel Marques Tavares

Abstract: Process discovery methods have obtained remarkable achievements in Process Mining, delivering comprehensible process models to enhance management capabilities. However, selecting the suitable method for a specific event log highly relies on human expertise, hindering its broad application. Solutions based on Meta-learning (MtL) have been promising for creating systems with reduced human assistance… ▽ More Process discovery methods have obtained remarkable achievements in Process Mining, delivering comprehensible process models to enhance management capabilities. However, selecting the suitable method for a specific event log highly relies on human expertise, hindering its broad application. Solutions based on Meta-learning (MtL) have been promising for creating systems with reduced human assistance. This paper presents a MtL solution for recommending process discovery methods that maximize model quality according to complementary dimensions. Thanks to our MtL pipeline, it was possible to recommend a discovery method with 92% of accuracy using light-weight features that describe the event log. Our experimental analysis also provided significant insights on the importance of log features in generating recommendations, paving the way to a deeper understanding of the discovery algorithms. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 16 pages, 6 figures

arXiv:1708.03529 [pdf, other]

Quantify resilience enhancement of UTS through exploiting connect community and internet of everything emerging technologies

Authors: Emanuele Bellini, Paolo Ceravolo, Paolo Besi

Abstract: This work aims at investigating and quantifying the Urban Transport System (UTS) resilience enhancement enabled by the adoption of emerging technology such as Internet of Everything (IoE) and the new trend of the Connected Community (CC). A conceptual extension of Functional Resonance Analysis Method (FRAM) and its formalization have been proposed and used to model UTS complexity. The scope is to… ▽ More This work aims at investigating and quantifying the Urban Transport System (UTS) resilience enhancement enabled by the adoption of emerging technology such as Internet of Everything (IoE) and the new trend of the Connected Community (CC). A conceptual extension of Functional Resonance Analysis Method (FRAM) and its formalization have been proposed and used to model UTS complexity. The scope is to identify the system functions and their interdependencies with a particular focus on those that have a relation and impact on people and communities. Network analysis techniques have been applied to the FRAM model to identify and estimate the most critical community-related functions. The notion of Variability Rate (VR) has been defined as the amount of output variability generated by an upstream function that can be tolerated/absorbed by a downstream function, without significantly increasing of its subsequent output variability. A fuzzy based quantification of the VR on expert judgment has been developed when quantitative data are not available. Our approach has been applied to a critical scenario (water bomb/flash flooding) considering two cases: when UTS has CC and IoE implemented or not. The results show a remarkable VR enhancement if CC and IoE are deployed △ Less

Submitted 11 August, 2017; originally announced August 2017.

Showing 1–16 of 16 results for author: Ceravolo, P