-
Variable selection for Naïve Bayes classification
Authors:
Rafael Blanquero,
Emilio Carrizosa,
Pepa Ramírez-Cobo,
M. Remedios Sillero-Denamiel
Abstract:
The Naïve Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naïve Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the re…
▽ More
The Naïve Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naïve Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the results as well as slow down the method's execution.
In this paper we propose a sparse version of the Naïve Bayes classifier that is characterized by three properties. First, the sparsity is achieved taking into account the correlation structure of the covariates. Second, different performance measures can be used to guide the selection of features. Third, performance constraints on groups of higher interest can be included. Our proposal leads to a smart search, which yields competitive running times, whereas the flexibility in terms of performance measure for classification is integrated. Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Naïve Bayes obtains competitive results regarding accuracy, sparsity and running times for balanced datasets. In the case of datasets with unbalanced (or with different importance) classes, a better compromise between classification rates for the different classes is achieved.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
A cost-sensitive constrained Lasso
Authors:
Rafael Blanquero,
Emilio Carrizosa,
Pepa Ramírez-Cobo,
M. Remedios Sillero-Denamiel
Abstract:
The Lasso has become a benchmark data analysis procedure, and numerous variants have been proposed in the literature. Although the Lasso formulations are stated so that overall prediction error is optimized, no full control over the accuracy prediction on certain individuals of interest is allowed. In this work we propose a novel version of the Lasso in which quadratic performance constraints are…
▽ More
The Lasso has become a benchmark data analysis procedure, and numerous variants have been proposed in the literature. Although the Lasso formulations are stated so that overall prediction error is optimized, no full control over the accuracy prediction on certain individuals of interest is allowed. In this work we propose a novel version of the Lasso in which quadratic performance constraints are added to Lasso-based objective functions, in such a way that threshold values are set to bound the prediction errors in the different groups of interest (not necessarily disjoint). As a result, a constrained sparse regression model is defined by a nonlinear optimization problem. This cost-sensitive constrained Lasso has a direct application in heterogeneous samples where data are collected from distinct sources, as it is standard in many biomedical contexts. Both theoretical properties and empirical studies concerning the new method are explored in this paper. In addition, two illustrations of the method on biomedical and sociological contexts are considered.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
A bivariate two-state Markov modulated Poisson process for failure modelling
Authors:
Yoel G. Yera,
Rosa E. Lillo,
Bo F. Nielsen,
Pepa Ramírez-Cobo,
Fabrizio Ruggeri
Abstract:
Motivated by a real failure dataset in a two-dimensional context, this paper presents an extension of the Markov modulated Poisson process (MMPP) to two dimensions. The one-dimensional MMPP has been proposed for the modeling of dependent and non-exponential inter-failure times (in contexts as queuing, risk or reliability, among others). The novel two-dimensional MMPP allows for dependence between…
▽ More
Motivated by a real failure dataset in a two-dimensional context, this paper presents an extension of the Markov modulated Poisson process (MMPP) to two dimensions. The one-dimensional MMPP has been proposed for the modeling of dependent and non-exponential inter-failure times (in contexts as queuing, risk or reliability, among others). The novel two-dimensional MMPP allows for dependence between the two sequences of inter-failure times, while at the same time preserves the MMPP properties, marginally. The generalization is based on the Marshall-Olkin exponential distribution. Inference is undertaken for the new model through a method combining a matching moments approach with an Approximate Bayesian Computation (ABC) algorithm. The performance of the method is shown on simulated and real datasets representing times and distances covered between consecutive failures in a public transport company. For the real dataset, some quantities of importance associated with the reliability of the system are estimated as the probabilities and expected number of failures at different times and distances covered by trains until the occurrence of a failure.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Fitting procedure for the two-state Batch Markov modulated Poisson process
Authors:
Yoel G. Yera,
Rosa E. Lillo,
Pepa Ramírez-Cobo
Abstract:
The Batch Markov Modulated Poisson Process (BMMPP) is a subclass of the versatile Batch Markovian Arrival process (BMAP) which has been proposed for the modeling of dependent events occurring in batches (as group arrivals, failures or risk events). This paper focuses on exploring the possibilities of the BMMPP for the modeling of real phenomena involving point processes with group arrivals. The fi…
▽ More
The Batch Markov Modulated Poisson Process (BMMPP) is a subclass of the versatile Batch Markovian Arrival process (BMAP) which has been proposed for the modeling of dependent events occurring in batches (as group arrivals, failures or risk events). This paper focuses on exploring the possibilities of the BMMPP for the modeling of real phenomena involving point processes with group arrivals. The first result in this sense is the characterization of the two-state BMMPP with maximum batch size equal to K, the BMMPP2(K), by a set of moments related to the inter-event time and batch size distributions. This characterization leads to a sequential fitting approach via a moments matching method. The performance of the novel fitting approach is illustrated on both simulated and a real teletraffic data set, and compared to that of the EM algorithm. In addition, as an extension of the inference approach, the queue length distributions at departures in the queueing system BMMPP/M/1 is also estimated.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Analysis of an aggregate loss model in a Markov renewal regime
Authors:
Pepa Ramírez-Cobo,
Emilio Carrizosa,
Rosa Elvira Lillo
Abstract:
In this article we consider an aggregate loss model with dependent losses. The losses occurrence process is governed by a two-state Markovian arrival process (MAP2), a Markov renewal process process that allows for (1) correlated inter-losses times, (2) non-exponentially distributed inter-losses times and, (3) overdisperse losses counts. Some quantities of interest to measure persistence in the lo…
▽ More
In this article we consider an aggregate loss model with dependent losses. The losses occurrence process is governed by a two-state Markovian arrival process (MAP2), a Markov renewal process process that allows for (1) correlated inter-losses times, (2) non-exponentially distributed inter-losses times and, (3) overdisperse losses counts. Some quantities of interest to measure persistence in the loss occurrence process are obtained. Given a real operational risk database, the aggregate loss model is estimated by fitting separately the inter-losses times and severities. The MAP2 is estimated via direct maximization of the likelihood function, and severities are modeled by the heavy-tailed, double-Pareto Lognormal distribution. In comparison with the fit provided by the Poisson process, the results point out that taking into account the dependence and overdispersion in the inter-losses times distribution leads to higher capital charges.
△ Less
Submitted 4 February, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Cost-sensitive Feature Selection for Support Vector Machines
Authors:
Sandra Benítez-Peña,
Rafael Blanquero,
Emilio Carrizosa,
Pepa Ramírez-Cobo
Abstract:
Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequent…
▽ More
Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf Feature Selection procedures seldom take into account such cost-sensitivity of errors.
In this paper we propose a mathematical-optimization-based Feature Selection procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines, accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for Support Vector Machines with both linear and radial kernels.
The reported numerical experience demonstrates the usefulness of the proposed Feature Selection procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
On support vector machines under a multiple-cost scenario
Authors:
Sandra Benítez-Peña,
Rafael Blanquero,
Emilio Carrizosa,
Pepa Ramírez-Cobo
Abstract:
Support Vector Machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for…
▽ More
Support Vector Machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Cost-sensitive probabilistic predictions for support vector machines
Authors:
Sandra Benítez-Peña,
Rafael Blanquero,
Emilio Carrizosa,
Pepa Ramírez-Cobo
Abstract:
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other ha…
▽ More
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Maximum likelihood estimation in the two-state Markovian arrival process
Authors:
Emilio Carrizosa,
Pepa Ramírez-Cobo
Abstract:
The Markovian arrival process (MAP) has proven a versatile model for fitting dependent and non-exponential interarrival times, with a number of applications to queueing, teletraffic, reliability or finance. Despite theoretical properties of MAPs and models involving MAPs are well studied, their estimation remains less explored. This paper examines maximum likelihood estimation of the second-order…
▽ More
The Markovian arrival process (MAP) has proven a versatile model for fitting dependent and non-exponential interarrival times, with a number of applications to queueing, teletraffic, reliability or finance. Despite theoretical properties of MAPs and models involving MAPs are well studied, their estimation remains less explored. This paper examines maximum likelihood estimation of the second-order MAP using a recently obtained parameterization of the two-state MAPs.
△ Less
Submitted 14 January, 2014;
originally announced January 2014.
-
Bayesian inference for double Pareto lognormal queues
Authors:
Pepa Ramirez-Cobo,
Rosa E. Lillo,
Simon Wilson,
Michael P. Wiper
Abstract:
In this article we describe a method for carrying out Bayesian estimation for the double Pareto lognormal (dPlN) distribution which has been proposed as a model for heavy-tailed phenomena. We apply our approach to estimate the $\mathit{dPlN}/M/1$ and $M/\mathit{dPlN}/1$ queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not…
▽ More
In this article we describe a method for carrying out Bayesian estimation for the double Pareto lognormal (dPlN) distribution which has been proposed as a model for heavy-tailed phenomena. We apply our approach to estimate the $\mathit{dPlN}/M/1$ and $M/\mathit{dPlN}/1$ queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not possess a Laplace transform in closed form. This difficulty is overcome using some recent approximations for the Laplace transform of the interarrival distribution for the $\mathit{Pareto}/M/1$ system. Our procedure is illustrated with applications in internet traffic analysis and risk theory.
△ Less
Submitted 15 November, 2010;
originally announced November 2010.