Search | arXiv e-print repository

AI in Support of Diversity and Inclusion

Authors: Çiçek Güven, Afra Alishahi, Henry Brighton, Gonzalo Nápoles, Juan Sebastian Olier, Marie Šafář, Eric Postma, Dimitar Shterionov, Mirella De Sisto, Eva Vanmassenhove

Abstract: In this paper, we elaborate on how AI can support diversity and inclusion and exemplify research projects conducted in that direction. We start by looking at the challenges and progress in making large language models (LLMs) more transparent, inclusive, and aware of social biases. Even though LLMs like ChatGPT have impressive abilities, they struggle to understand different cultural contexts and e… ▽ More In this paper, we elaborate on how AI can support diversity and inclusion and exemplify research projects conducted in that direction. We start by looking at the challenges and progress in making large language models (LLMs) more transparent, inclusive, and aware of social biases. Even though LLMs like ChatGPT have impressive abilities, they struggle to understand different cultural contexts and engage in meaningful, human like conversations. A key issue is that biases in language processing, especially in machine translation, can reinforce inequality. Tackling these biases requires a multidisciplinary approach to ensure AI promotes diversity, fairness, and inclusion. We also highlight AI's role in identifying biased content in media, which is important for improving representation. By detecting unequal portrayals of social groups, AI can help challenge stereotypes and create more inclusive technologies. Transparent AI algorithms, which clearly explain their decisions, are essential for building trust and reducing bias in AI systems. We also stress AI systems need diverse and inclusive training data. Projects like the Child Growth Monitor show how using a wide range of data can help address real world problems like malnutrition and poverty. We present a project that demonstrates how AI can be applied to monitor the role of search engines in spreading disinformation about the LGBTQ+ community. Moreover, we discuss the SignON project as an example of how technology can bridge communication gaps between hearing and deaf people, emphasizing the importance of collaboration and mutual trust in developing inclusive AI. Overall, with this paper, we advocate for AI systems that are not only effective but also socially responsible, promoting fair and inclusive interactions between humans and machines. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: 14 pages, 2 figures

arXiv:2305.09399 [pdf, other]

Measuring Implicit Bias Using SHAP Feature Importance and Fuzzy Cognitive Maps

Authors: Isel Grau, Gonzalo Nápoles, Fabian Hoitsma, Lisa Koutsoviti Koumeri, Koen Vanhoof

Abstract: In this paper, we integrate the concepts of feature importance with implicit bias in the context of pattern classification. This is done by means of a three-step methodology that involves (i) building a classifier and tuning its hyperparameters, (ii) building a Fuzzy Cognitive Map model able to quantify implicit bias, and (iii) using the SHAP feature importance to active the neural concepts when p… ▽ More In this paper, we integrate the concepts of feature importance with implicit bias in the context of pattern classification. This is done by means of a three-step methodology that involves (i) building a classifier and tuning its hyperparameters, (ii) building a Fuzzy Cognitive Map model able to quantify implicit bias, and (iii) using the SHAP feature importance to active the neural concepts when performing simulations. The results using a real case study concerning fairness research support our two-fold hypothesis. On the one hand, it is illustrated the risks of using a feature importance method as an absolute tool to measure implicit bias. On the other hand, it is concluded that the amount of bias towards protected features might differ depending on whether the features are numerically or categorically encoded. △ Less

Submitted 17 May, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted at the Intelligent Systems Conference (IntelliSys) 2023 and will be presented on 7-8 September 2023

arXiv:2210.14687 [pdf, other]

Which is the best model for my data?

Authors: Gonzalo Nápoles, Isel Grau, Çiçek Güven, Orçun Özdemir, Yamisleydi Salgueiro

Abstract: In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed meta-learning approach purely relies on machine learning and involves four major steps. Firstly, we present a concise collection of 62 meta-features that address the pro… ▽ More In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed meta-learning approach purely relies on machine learning and involves four major steps. Firstly, we present a concise collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements. Secondly, we describe two different approaches for synthetic data generation intending to enlarge the training data. Thirdly, we fit a set of pre-defined classification models for each classification problem while optimizing their hyperparameters using grid search. The goal is to create a meta-dataset such that each row denotes a multilabel instance describing a specific problem. The features of these meta-instances denote the statistical properties of the generated datasets, while the labels encode the grid search results as binary vectors such that best-performing models are positively labeled. Finally, we tackle the model selection problem with several multilabel classifiers, including a Convolutional Neural Network designed to handle tabular data. The simulation results show that our meta-learning approach can correctly predict an optimal model for 91% of the synthetic datasets and for 87% of the real-world datasets. Furthermore, we noticed that most meta-classifiers produced better results when using our meta-features. Overall, our proposal differs from other meta-learning approaches since it tackles the algorithm selection and hyperparameter tuning problems in a single step. Toward the end, we perform a feature importance analysis to determine which statistical features drive the model selection mechanism. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.04340 [pdf, other]

Multi-objective hyperparameter optimization with performance uncertainty

Authors: Alejandro Morales-Hernández, Inneke Van Nieuwenhuyse, Gonzalo Nápoles

Abstract: The performance of any Machine Learning (ML) algorithm is impacted by the choice of its hyperparameters. As training and evaluating a ML algorithm is usually expensive, the hyperparameter optimization (HPO) method needs to be computationally efficient to be useful in practice. Most of the existing approaches on multi-objective HPO use evolutionary strategies and metamodel-based optimization. Howev… ▽ More The performance of any Machine Learning (ML) algorithm is impacted by the choice of its hyperparameters. As training and evaluating a ML algorithm is usually expensive, the hyperparameter optimization (HPO) method needs to be computationally efficient to be useful in practice. Most of the existing approaches on multi-objective HPO use evolutionary strategies and metamodel-based optimization. However, few methods have been developed to account for uncertainty in the performance measurements. This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of ML algorithms. We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise. Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR, achieved with respect to the hypervolume indicator. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: Presented in the International Conference in Optimization and Learning (2022)

arXiv:2112.12717 [pdf, other]

Forward Composition Propagation for Explainable Neural Reasoning

Authors: Isel Grau, Gonzalo Nápoles, Marilyn Bello, Yamisleydi Salgueiro, Agnieszka Jastrzebska

Abstract: This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured classification problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subse… ▽ More This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured classification problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subsequently propagated through the whole network until reaching the output layer. The sign of each composition value indicates whether the corresponding feature excites or inhibits the neuron, while the absolute value quantifies its impact. The FCP algorithm is executed on a post-hoc basis, i.e., once the learning process is completed. Aiming to illustrate the FCP algorithm, this paper develops a case study concerning bias detection in a fairness problem in which the ground truth is known. The simulation results show that the composition values closely align with the expected behavior of protected features. The source code and supplementary material for this paper are available at https://github.com/igraugar/fcp. △ Less

Submitted 24 October, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.12713 [pdf, other]

Modeling Implicit Bias with Fuzzy Cognitive Maps

Authors: Gonzalo Nápoles, Isel Grau, Leonardo Concepción, Lisa Koutsoviti Koumeri, João Paulo Papa

Abstract: This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets where features can be numeric or discrete. In our proposal, problem features are mapped to neural concepts that are initially activated by experts when running what-if simulations, whereas weights connecting the neural concepts represent absolute correlation/association patterns between features. In ad… ▽ More This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets where features can be numeric or discrete. In our proposal, problem features are mapped to neural concepts that are initially activated by experts when running what-if simulations, whereas weights connecting the neural concepts represent absolute correlation/association patterns between features. In addition, we introduce a new reasoning mechanism equipped with a normalization-like transfer function that prevents neurons from saturating. Another advantage of this new reasoning mechanism is that it can easily be controlled by regulating nonlinearity when updating neurons' activation values in each iteration. Finally, we study the convergence of our model and derive analytical conditions concerning the existence and unicity of fixed-point attractors. △ Less

Submitted 13 January, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.12641 [pdf, other]

Prolog-based agnostic explanation module for structured pattern classification

Authors: Gonzalo Nápoles, Fabian Hoitsma, Andreas Knoben, Agnieszka Jastrzebska, Maikel Leon Espinosa

Abstract: This paper presents a Prolog-based reasoning module to generate counterfactual explanations given the predictions computed by a black-box classifier. The proposed symbolic reasoning module can also resolve what-if queries using the ground-truth labels instead of the predicted ones. Overall, our approach comprises four well-defined stages that can be applied to any structured pattern classification… ▽ More This paper presents a Prolog-based reasoning module to generate counterfactual explanations given the predictions computed by a black-box classifier. The proposed symbolic reasoning module can also resolve what-if queries using the ground-truth labels instead of the predicted ones. Overall, our approach comprises four well-defined stages that can be applied to any structured pattern classification problem. Firstly, we pre-process the given dataset by imputing missing values and normalizing the numerical features. Secondly, we transform numerical features into symbolic ones using fuzzy clustering such that extracted fuzzy clusters are mapped to an ordered set of predefined symbols. Thirdly, we encode instances as a Prolog rule using the nominal values, the predefined symbols, the decision classes, and the confidence values. Fourthly, we compute the overall confidence of each Prolog rule using fuzzy-rough set theory to handle the uncertainty caused by transforming numerical quantities into symbols. This step comes with an additional theoretical contribution to a new similarity function to compare the previously defined Prolog rules involving confidence values. Finally, we implement a chatbot as a proxy between human beings and the Prolog-based reasoning module to resolve natural language queries and generate counterfactual explanations. During the numerical simulations using synthetic datasets, we study the performance of our system when using different fuzzy operators and similarity functions. Towards the end, we illustrate how our reasoning module works using different use cases. △ Less

Submitted 18 November, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.04933 [pdf, other]

Measuring Wind Turbine Health Using Drifting Concepts

Authors: Agnieszka Jastrzebska, Alejandro Morales-Hernández, Gonzalo Nápoles, Yamisleydi Salgueiro, Koen Vanhoof

Abstract: Time series processing is an essential aspect of wind turbine health monitoring. Despite the progress in this field, there is still room for new methods to improve modeling quality. In this paper, we propose two new approaches for the analysis of wind turbine health. Both approaches are based on abstract concepts, implemented using fuzzy sets, which summarize and aggregate the underlying raw data.… ▽ More Time series processing is an essential aspect of wind turbine health monitoring. Despite the progress in this field, there is still room for new methods to improve modeling quality. In this paper, we propose two new approaches for the analysis of wind turbine health. Both approaches are based on abstract concepts, implemented using fuzzy sets, which summarize and aggregate the underlying raw data. By observing the change in concepts, we infer about the change in the turbine's health. Analyzes are carried out separately for different external conditions (wind speed and temperature). We extract concepts that represent relative low, moderate, and high power production. The first method aims at evaluating the decrease or increase in relatively high and low power production. This task is performed using a regression-like model. The second method evaluates the overall drift of the extracted concepts. Large drift indicates that the power production process undergoes fluctuations in time. Concepts are labeled using linguistic labels, thus equipping our model with improved interpretability features. We applied the proposed approach to process publicly available data describing four wind turbines. The simulation results have shown that the aging process is not homogeneous in all wind turbines. △ Less

Submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.12749 [pdf, other]

doi 10.7717/peerj-cs.1078

FCMpy: A Python Module for Constructing and Analyzing Fuzzy Cognitive Maps

Authors: Samvel Mkhitaryan, Philippe J. Giabbanelli, Maciej K. Wozniak, Gonzalo Napoles, Nanne K. de Vries, Rik Crutzen

Abstract: FCMpy is an open source package in Python for building and analyzing Fuzzy Cognitive Maps. More specifically, the package allows 1) deriving fuzzy causal weights from qualitative data, 2) simulating the system behavior, 3) applying machine learning algorithms (e.g., Nonlinear Hebbian Learning, Active Hebbian Learning, Genetic Algorithms and Deterministic Learning) to adjust the FCM causal weight m… ▽ More FCMpy is an open source package in Python for building and analyzing Fuzzy Cognitive Maps. More specifically, the package allows 1) deriving fuzzy causal weights from qualitative data, 2) simulating the system behavior, 3) applying machine learning algorithms (e.g., Nonlinear Hebbian Learning, Active Hebbian Learning, Genetic Algorithms and Deterministic Learning) to adjust the FCM causal weight matrix and to solve classification problems, and 4) implementing scenario analysis by simulating hypothetical interventions (i.e., analyzing what-if scenarios). △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 22 pages, 9 Figures

Journal ref: PeerJ Computer Science 8:e1078, 2022

arXiv:2108.09098 [pdf, other]

doi 10.1016/j.patrec.2022.01.005

A fuzzy-rough uncertainty measure to discover bias encoded explicitly or implicitly in features of structured pattern classification datasets

Authors: Gonzalo Nápoles, Lisa Koutsoviti Koumeri

Abstract: The need to measure bias encoded in tabular data that are used to solve pattern recognition problems is widely recognized by academia, legislators and enterprises alike. In previous work, we proposed a bias quantification measure, called fuzzy-rough uncer-tainty, which relies on the fuzzy-rough set theory. The intuition dictates that protected features should not change the fuzzy-rough boundary re… ▽ More The need to measure bias encoded in tabular data that are used to solve pattern recognition problems is widely recognized by academia, legislators and enterprises alike. In previous work, we proposed a bias quantification measure, called fuzzy-rough uncer-tainty, which relies on the fuzzy-rough set theory. The intuition dictates that protected features should not change the fuzzy-rough boundary regions of a decision class significantly. The extent to which this happens is a proxy for bias expressed as uncertainty in adecision-making context. Our measure's main advantage is that it does not depend on any machine learning prediction model but adistance function. In this paper, we extend our study by exploring the existence of bias encoded implicitly in non-protected featuresas defined by the correlation between protected and unprotected attributes. This analysis leads to four scenarios that domain experts should evaluate before deciding how to tackle bias. In addition, we conduct a sensitivity analysis to determine the fuzzy operatorsand distance function that best capture change in the boundary regions. △ Less

Submitted 21 January, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

Journal ref: Pattern Recognition Letters. 154(1), 2021

arXiv:2107.03423 [pdf, other]

Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern Classification

Authors: Gonzalo Nápoles, Yamisleydi Salgueiro, Isel Grau, Maikel Leon Espinosa

Abstract: Machine learning solutions for pattern classification problems are nowadays widely deployed in society and industry. However, the lack of transparency and accountability of most accurate models often hinders their safe use. Thus, there is a clear need for developing explainable artificial intelligence mechanisms. There exist model-agnostic methods that summarize feature contributions, but their in… ▽ More Machine learning solutions for pattern classification problems are nowadays widely deployed in society and industry. However, the lack of transparency and accountability of most accurate models often hinders their safe use. Thus, there is a clear need for developing explainable artificial intelligence mechanisms. There exist model-agnostic methods that summarize feature contributions, but their interpretability is limited to predictions made by black-box models. An open challenge is to develop models that have intrinsic interpretability and produce their own explanations, even for classes of models that are traditionally considered black boxes like (recurrent) neural networks. In this paper, we propose a Long-Term Cognitive Network for interpretable pattern classification of structured data. Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process. For supporting the interpretability without affecting the performance, the model incorporates more flexibility through a quasi-nonlinear reasoning rule that allows controlling nonlinearity. Besides, we propose a recurrence-aware decision model that evades the issues posed by the unique fixed point while introducing a deterministic learning algorithm to compute the tunable parameters. The simulations show that our interpretable model obtains competitive results when compared to state-of-the-art white and black-box models. △ Less

Submitted 23 December, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

arXiv:2107.00425 [pdf, other]

Online learning of windmill time series using Long Short-term Cognitive Networks

Authors: Alejandro Morales-Hernández, Gonzalo Nápoles, Agnieszka Jastrzebska, Yamisleydi Salgueiro, Koen Vanhoof

Abstract: Forecasting windmill time series is often the basis of other processes such as anomaly detection, health monitoring, or maintenance scheduling. The amount of data generated on windmill farms makes online learning the most viable strategy to follow. Such settings require retraining the model each time a new batch of data is available. However, update the model with the new information is often very… ▽ More Forecasting windmill time series is often the basis of other processes such as anomaly detection, health monitoring, or maintenance scheduling. The amount of data generated on windmill farms makes online learning the most viable strategy to follow. Such settings require retraining the model each time a new batch of data is available. However, update the model with the new information is often very expensive to perform using traditional Recurrent Neural Networks (RNNs). In this paper, we use Long Short-term Cognitive Networks (LSTCNs) to forecast windmill time series in online settings. These recently introduced neural systems consist of chained Short-term Cognitive Network blocks, each processing a temporal data chunk. The learning algorithm of these blocks is based on a very fast, deterministic learning rule that makes LSTCNs suitable for online learning tasks. The numerical simulations using a case study with four windmills showed that our approach reported the lowest forecasting errors with respect to a simple RNN, a Long Short-term Memory, a Gated Recurrent Unit, and a Hidden Markov Model. What is perhaps more important is that the LSTCN approach is significantly faster than these state-of-the-art models. △ Less

Submitted 16 September, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2106.16233 [pdf, other]

Long Short-term Cognitive Networks

Authors: Gonzalo Nápoles, Isel Grau, Agnieszka Jastrzebska, Yamisleydi Salgueiro

Abstract: In this paper, we present a recurrent neural system named Long Short-term Cognitive Networks (LSTCNs) as a generalization of the Short-term Cognitive Network (STCN) model. Such a generalization is motivated by the difficulty of forecasting very long time series efficiently. The LSTCN model can be defined as a collection of STCN blocks, each processing a specific time patch of the (multivariate) ti… ▽ More In this paper, we present a recurrent neural system named Long Short-term Cognitive Networks (LSTCNs) as a generalization of the Short-term Cognitive Network (STCN) model. Such a generalization is motivated by the difficulty of forecasting very long time series efficiently. The LSTCN model can be defined as a collection of STCN blocks, each processing a specific time patch of the (multivariate) time series being modeled. In this neural ensemble, each block passes information to the subsequent one in the form of weight matrices representing the prior knowledge. As a second contribution, we propose a deterministic learning algorithm to compute the learnable weights while preserving the prior knowledge resulting from previous learning processes. As a third contribution, we introduce a feature influence score as a proxy to explain the forecasting process in multivariate time series. The simulations using three case studies show that our neural system reports small forecasting errors while being significantly faster than state-of-the-art recurrent models. △ Less

Submitted 16 September, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

arXiv:1809.08085 [pdf, other]

Short-term Cognitive Networks, Flexible Reasoning and Nonsynaptic Learning

Authors: Gonzalo Nápoles, Frank Vanhoenshoven, Koen Vanhoof

Abstract: While the machine learning literature dedicated to fully automated reasoning algorithms is abundant, the number of methods enabling the inference process on the basis of previously defined knowledge structures is scanter. Fuzzy Cognitive Maps (FCMs) are neural networks that can be exploited towards this goal because of their flexibility to handle external knowledge. However, FCMs suffer from a num… ▽ More While the machine learning literature dedicated to fully automated reasoning algorithms is abundant, the number of methods enabling the inference process on the basis of previously defined knowledge structures is scanter. Fuzzy Cognitive Maps (FCMs) are neural networks that can be exploited towards this goal because of their flexibility to handle external knowledge. However, FCMs suffer from a number of issues that range from the limited prediction horizon to the absence of theoretically sound learning algorithms able to produce accurate predictions. In this paper, we propose a neural network system named Short-term Cognitive Networks that tackle some of these limitations. In our model weights are not constricted and may have a causal nature or not. As a second contribution, we present a nonsynaptic learning algorithm to improve the network performance without modifying the previously defined weights. Moreover, we derive a stop condition to prevent the learning algorithm from iterating without decreasing the simulation error. △ Less

Submitted 16 September, 2018; originally announced September 2018.

Showing 1–14 of 14 results for author: Napoles, G