-
Empirical risk minimization algorithm for multiclass classification of S.D.E. paths
Authors:
Christophe Denis,
Eddy Ella Mintsa
Abstract:
We address the multiclass classification problem for stochastic diffusion paths, assuming that the classes are distinguished by their drift functions, while the diffusion coefficient remains common across all classes. In this setting, we propose a classification algorithm that relies on the minimization of the L 2 risk. We establish rates of convergence for the resulting predictor. Notably, we int…
▽ More
We address the multiclass classification problem for stochastic diffusion paths, assuming that the classes are distinguished by their drift functions, while the diffusion coefficient remains common across all classes. In this setting, we propose a classification algorithm that relies on the minimization of the L 2 risk. We establish rates of convergence for the resulting predictor. Notably, we introduce a margin assumption under which we show that our procedure can achieve fast rates of convergence. Finally, a simulation study highlights the numerical performance of our classification algorithm.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Nonparametric plug-in classifier for multiclass classification of S.D.E. paths
Authors:
Christophe Denis,
Charlotte Dion-Blanc,
Eddy Ella Mintsa,
Viet-Chi Tran
Abstract:
We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first esta…
▽ More
We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.
△ Less
Submitted 27 September, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Set-valued classification -- overview via a unified framework
Authors:
Evgenii Chzhen,
Christophe Denis,
Mohamed Hebiri,
Titouan Lorieul
Abstract:
Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued…
▽ More
Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued classification are available in the literature and each of them leads to different prediction strategies. The present survey aims to review popular formulations using a unified statistical framework. The proposed framework encompasses previously considered and leads to new formulations as well as it allows to understand underlying trade-offs of each formulation. We provide infinite sample optimal set-valued classification strategies and review a general plug-in principle to construct data-driven algorithms. The exposition is supported by examples and pointers to both theoretical and practical contributions. Finally, we provide experiments on real-world datasets comparing these approaches in practice and providing general practical guidelines.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Regression with reject option and application to kNN
Authors:
Christophe Denis,
Mohamed Hebiri,
Ahmed Zaoui
Abstract:
We investigate the problem of regression where one is allowed to abstain from predicting. We refer to this framework as regression with reject option as an extension of classification with reject option. In this context, we focus on the case where the rejection rate is fixed and derive the optimal rule which relies on thresholding the conditional variance function. We provide a semi-supervised est…
▽ More
We investigate the problem of regression where one is allowed to abstain from predicting. We refer to this framework as regression with reject option as an extension of classification with reject option. In this context, we focus on the case where the rejection rate is fixed and derive the optimal rule which relies on thresholding the conditional variance function. We provide a semi-supervised estimation procedure of the optimal rule involving two datasets: a first labeled dataset is used to estimate both regression function and conditional variance function while a second unlabeled dataset is exploited to calibrate the desired rejection rate. The resulting predictor with reject option is shown to be almost as good as the optimal predictor with reject option both in terms of risk and rejection rate. We additionally apply our methodology with kNN algorithm and establish rates of convergence for the resulting kNN predictor under mild conditions. Finally, a numerical study is performed to illustrate the benefit of using the proposed procedure.
△ Less
Submitted 5 March, 2021; v1 submitted 30 June, 2020;
originally announced June 2020.
-
Fair Regression with Wasserstein Barycenters
Authors:
Evgenii Chzhen,
Christophe Denis,
Mohamed Hebiri,
Luca Oneto,
Massimiliano Pontil
Abstract:
We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint. It demands the distribution of the predicted output to be independent of the sensitive attribute. We consider the case that the sensitive attribute is available for prediction. We establish a connection between fair regression and optimal transport theory, based on which we derive a close form…
▽ More
We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint. It demands the distribution of the predicted output to be independent of the sensitive attribute. We consider the case that the sensitive attribute is available for prediction. We establish a connection between fair regression and optimal transport theory, based on which we derive a close form expression for the optimal fair predictor. Specifically, we show that the distribution of this optimum is the Wasserstein barycenter of the distributions induced by the standard regression function on the sensitive groups. This result offers an intuitive interpretation of the optimal fair prediction and suggests a simple post-processing algorithm to achieve fairness. We establish risk and distribution-free fairness guarantees for this procedure. Numerical experiments indicate that our method is very effective in learning fair models, with a relative increase in error rate that is inferior to the relative gain in fairness.
△ Less
Submitted 23 June, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
A novel regularized approach for functional data clustering: An application to milking kinetics in dairy goats
Authors:
C. Denis,
E. Lebarbier,
C. Lévy-Leduc,
O. Martin,
L. Sansonnet
Abstract:
Motivated by an application to the clustering of milking kinetics of dairy goats, we propose in this paper a novel approach for functional data clustering. This issue is of growing interest in precision livestock farming that has been largely based on the development of data acquisition automation and on the development of interpretative tools to capitalize on high-throughput raw data and to gener…
▽ More
Motivated by an application to the clustering of milking kinetics of dairy goats, we propose in this paper a novel approach for functional data clustering. This issue is of growing interest in precision livestock farming that has been largely based on the development of data acquisition automation and on the development of interpretative tools to capitalize on high-throughput raw data and to generate benchmarks for phenotypic traits. The method that we propose in this paper falls in this context. Our methodology relies on a piecewise linear estimation of curves based on a novel regularized change-point estimation method and on the k-means algorithm applied to a vector of coefficients summarizing the curves. The statistical performance of our method is assessed through numerical experiments and is thoroughly compared with existing ones. Our technique is finally applied to milk emission kinetics data with the aim of a better characterization of inter-animal variability and toward a better understanding of the lactation process.
△ Less
Submitted 22 July, 2019;
originally announced July 2019.
-
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
Authors:
Evgenii Chzhen,
Christophe Denis,
Mohamed Hebiri,
Luca Oneto,
Massimiliano Pontil
Abstract:
We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us…
▽ More
We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets.
△ Less
Submitted 4 February, 2020; v1 submitted 12 June, 2019;
originally announced June 2019.
-
On the benefits of output sparsity for multi-label classification
Authors:
Evgenii Chzhen,
Christophe Denis,
Mohamed Hebiri,
Joseph Salmon
Abstract:
The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different remedi…
▽ More
The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different remedies have been proposed to overcome the curse of dimensionality. In this work, we aim at exploiting the output sparsity by introducing a new loss, called the sparse weighted Hamming loss. This proposed loss can be seen as a weighted version of classical ones, where active and inactive labels are weighted separately. Leveraging the influence of sparsity in the loss function, we provide improved generalization bounds for the empirical risk minimizer, a suitable property for large-scale problems. For this new loss, we derive rates of convergence linear in the underlying output-sparsity rather than linear in the number of labels. In practice, minimizing the associated risk can be performed efficiently by using convex surrogates and modern convex optimization algorithms. We provide experiments on various real-world datasets demonstrating the pertinence of our approach when compared to non-weighted techniques.
△ Less
Submitted 14 March, 2017;
originally announced March 2017.
-
Classification in postural style
Authors:
Antoine Chambaz,
Christophe Denis
Abstract:
This article contributes to the search for a notion of postural style, focusing on the issue of classifying subjects in terms of how they maintain posture. Longer term, the hope is to make it possible to determine on a case by case basis which sensorial information is prevalent in postural control, and to improve/adapt protocols for functional rehabilitation among those who show deficits in mainta…
▽ More
This article contributes to the search for a notion of postural style, focusing on the issue of classifying subjects in terms of how they maintain posture. Longer term, the hope is to make it possible to determine on a case by case basis which sensorial information is prevalent in postural control, and to improve/adapt protocols for functional rehabilitation among those who show deficits in maintaining posture, typically seniors. Here, we specifically tackle the statistical problem of classifying subjects sampled from a two-class population. Each subject (enrolled in a cohort of 54 participants) undergoes four experimental protocols which are designed to evaluate potential deficits in maintaining posture. These protocols result in four complex trajectories, from which we can extract four small-dimensional summary measures. Because undergoing several protocols can be unpleasant, and sometimes painful, we try to limit the number of protocols needed for the classification. Therefore, we first rank the protocols by decreasing order of relevance, then we derive four plug-in classifiers which involve the best (i.e., more informative), the two best, the three best and all four protocols. This two-step procedure relies on the cutting-edge methodologies of targeted maximum likelihood learning (a methodology for robust and efficient inference) and super-learning (a machine learning procedure for aggregating various estimation procedures into a single better estimation procedure). A simulation study is carried out. The performances of the procedure applied to the real data set (and evaluated by the leave-one-out rule) go as high as an 87% rate of correct classification (47 out of 54 subjects correctly classified), using only the best protocol.
△ Less
Submitted 27 September, 2012;
originally announced September 2012.