-
Robust Classification with Noisy Labels Based on Posterior Maximization
Authors:
Nicola Novello,
Andrea M. Tonello
Abstract:
Designing objective functions robust to label noise is crucial for real-world classification algorithms. In this paper, we investigate the robustness to label noise of an $f$-divergence-based class of objective functions recently proposed for supervised classification, herein referred to as $f$-PML. We show that, in the presence of label noise, any of the $f$-PML objective functions can be correct…
▽ More
Designing objective functions robust to label noise is crucial for real-world classification algorithms. In this paper, we investigate the robustness to label noise of an $f$-divergence-based class of objective functions recently proposed for supervised classification, herein referred to as $f$-PML. We show that, in the presence of label noise, any of the $f$-PML objective functions can be corrected to obtain a neural network that is equal to the one learned with the clean dataset. Additionally, we propose an alternative and novel correction approach that, during the test phase, refines the posterior estimated by the neural network trained in the presence of label noise. Then, we demonstrate that, even if the considered $f$-PML objective functions are not symmetric, they are robust to symmetric label noise for any choice of $f$-divergence, without the need for any correction approach. This allows us to prove that the cross-entropy, which belongs to the $f$-PML class, is robust to symmetric label noise. Finally, we show that such a class of objective functions can be used together with refined training strategies, achieving competitive performance against state-of-the-art techniques of classification with label noise.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
Authors:
Nicola Novello,
Andrea M. Tonello
Abstract:
In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum…
▽ More
In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence. Furthermore, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of an objective function corresponding to a novel $f$-divergence referred to as shifted log (SL). We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.
△ Less
Submitted 16 May, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Mutual Information Estimation via $f$-Divergence and Data Derangements
Authors:
Nunzio A. Letizia,
Nicola Novello,
Andrea M. Tonello
Abstract:
Estimating mutual information accurately is pivotal across diverse applications, from machine learning to communications and biology, enabling us to gain insights into the inner mechanisms of complex systems. Yet, dealing with high-dimensional data presents a formidable challenge, due to its size and the presence of intricate relationships. Recently proposed neural methods employing variational lo…
▽ More
Estimating mutual information accurately is pivotal across diverse applications, from machine learning to communications and biology, enabling us to gain insights into the inner mechanisms of complex systems. Yet, dealing with high-dimensional data presents a formidable challenge, due to its size and the presence of intricate relationships. Recently proposed neural methods employing variational lower bounds on the mutual information have gained prominence. However, these approaches suffer from either high bias or high variance, as the sample size and the structure of the loss function directly influence the training process. In this paper, we propose a novel class of discriminative mutual information estimators based on the variational representation of the $f$-divergence. We investigate the impact of the permutation function used to obtain the marginal training samples and present a novel architectural solution based on derangements. The proposed estimator is flexible since it exhibits an excellent bias/variance trade-off. The comparison with state-of-the-art neural estimators, through extensive experimentation within established reference scenarios, shows that our approach offers higher accuracy and lower complexity.
△ Less
Submitted 11 November, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.