Search | arXiv e-print repository

Revisiting Reweighted Risk for Calibration: AURC, Focal Loss, and Inverse Focal Loss

Authors: Han Zhou, Sebastian G. Gruber, Teodora Popordanoska, Matthew B. Blaschko

Abstract: Several variants of reweighted risk functionals, such as focal losss, inverse focal loss, and the Area Under the Risk-Coverage Curve (AURC), have been proposed in the literature and claims have been made in relation to their calibration properties. However, focal loss and inverse focal loss propose vastly different weighting schemes. In this paper, we revisit a broad class of weighted risk functio… ▽ More Several variants of reweighted risk functionals, such as focal losss, inverse focal loss, and the Area Under the Risk-Coverage Curve (AURC), have been proposed in the literature and claims have been made in relation to their calibration properties. However, focal loss and inverse focal loss propose vastly different weighting schemes. In this paper, we revisit a broad class of weighted risk functions commonly used in deep learning and establish a principled connection between these reweighting schemes and calibration errors. We show that minimizing calibration error is closely linked to the selective classification paradigm and demonstrate that optimizing a regularized variant of the AURC naturally leads to improved calibration. This regularized AURC shares a similar reweighting strategy with inverse focal loss, lending support to the idea that focal loss is less principled when calibration is a desired outcome. Direct AURC optimization offers greater flexibility through the choice of confidence score functions (CSFs). To enable gradient-based optimization, we introduce a differentiable formulation of the regularized AURC using the SoftRank technique. Empirical evaluations demonstrate that our AURC-based loss achieves competitive class-wise calibration performance across a range of datasets and model architectures. △ Less

Submitted 10 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.19585 [pdf, other]

Beyond Segmentation: Confidence-Aware and Debiased Estimation of Ratio-based Biomarkers

Authors: Jiameng Li, Teodora Popordanoska, Sebastian G. Gruber, Frederik Maes, Matthew B. Blaschko

Abstract: Ratio-based biomarkers -- such as the proportion of necrotic tissue within a tumor -- are widely used in clinical practice to support diagnosis, prognosis and treatment planning. These biomarkers are typically estimated from soft segmentation outputs by computing region-wise ratios. Despite the high-stakes nature of clinical decision making, existing methods provide only point estimates, offering… ▽ More Ratio-based biomarkers -- such as the proportion of necrotic tissue within a tumor -- are widely used in clinical practice to support diagnosis, prognosis and treatment planning. These biomarkers are typically estimated from soft segmentation outputs by computing region-wise ratios. Despite the high-stakes nature of clinical decision making, existing methods provide only point estimates, offering no measure of uncertainty. In this work, we propose a unified \textit{confidence-aware} framework for estimating ratio-based biomarkers. We conduct a systematic analysis of error propagation in the segmentation-to-biomarker pipeline and identify model miscalibration as the dominant source of uncertainty. To mitigate this, we incorporate a lightweight, post-hoc calibration module that can be applied using internal hospital data without retraining. We leverage a tunable parameter $Q$ to control the confidence level of the derived bounds, allowing adaptation towards clinical practice. Extensive experiments show that our method produces statistically sound confidence intervals, with tunable confidence levels, enabling more trustworthy application of predictive biomarkers in clinical workflows. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 9 pages

arXiv:2410.07014 [pdf, other]

Optimizing Estimators of Squared Calibration Errors in Classification

Authors: Sebastian G. Gruber, Francis Bach

Abstract: In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings. Improving the calibration of classifiers is crucial for enhancing the trustworthiness and interpretability of machine learning models, especially in sensitive decision-making scenarios. Although various calibration (error) estimato… ▽ More In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings. Improving the calibration of classifiers is crucial for enhancing the trustworthiness and interpretability of machine learning models, especially in sensitive decision-making scenarios. Although various calibration (error) estimators exist in the current literature, there is a lack of guidance on selecting the appropriate estimator and tuning its hyperparameters. By leveraging the bilinear structure of squared calibration errors, we reformulate calibration estimation as a regression problem with independent and identically distributed (i.i.d.) input pairs. This reformulation allows us to quantify the performance of different estimators even for the most challenging calibration criterion, known as canonical calibration. Our approach advocates for a training-validation-testing pipeline when estimating a calibration error on an evaluation dataset. We demonstrate the effectiveness of our pipeline by optimizing existing calibration estimators and comparing them with novel kernel ridge regression-based estimators on standard image classification tasks. △ Less

Submitted 21 February, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

Comments: Published at TMLR, see https://openreview.net/forum?id=BPDVZajOW5

arXiv:2409.01314 [pdf, other]

Disentangling Mean Embeddings for Better Diagnostics of Image Generators

Authors: Sebastian G. Gruber, Pascal Tobias Ziegler, Florian Buettner

Abstract: The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for indivi… ▽ More The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases. △ Less

Submitted 12 December, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

Comments: Published at Interpretable AI: Past, Present and Future Workshop at NeurIPS 2024

arXiv:2408.06067 [pdf, other]

doi 10.1016/j.compbiomed.2024.109646

Neural Network Surrogate and Projected Gradient Descent for Fast and Reliable Finite Element Model Calibration: a Case Study on an Intervertebral Disc

Authors: Matan Atad, Gabriel Gruber, Marx Ribeiro, Luis Fernando Nicolini, Robert Graf, Hendrik Möller, Kati Nispel, Ivan Ezhov, Daniel Rueckert, Jan S. Kirschke

Abstract: Accurate calibration of finite element (FE) models is essential across various biomechanical applications, including human intervertebral discs (IVDs), to ensure their reliability and use in diagnosing and planning treatments. However, traditional calibration methods are computationally intensive, requiring iterative, derivative-free optimization algorithms that often take days to converge. This s… ▽ More Accurate calibration of finite element (FE) models is essential across various biomechanical applications, including human intervertebral discs (IVDs), to ensure their reliability and use in diagnosing and planning treatments. However, traditional calibration methods are computationally intensive, requiring iterative, derivative-free optimization algorithms that often take days to converge. This study addresses these challenges by introducing a novel, efficient, and effective calibration method demonstrated on a human L4-L5 IVD FE model as a case study using a neural network (NN) surrogate. The NN surrogate predicts simulation outcomes with high accuracy, outperforming other machine learning models, and significantly reduces the computational cost associated with traditional FE simulations. Next, a Projected Gradient Descent (PGD) approach guided by gradients of the NN surrogate is proposed to efficiently calibrate FE models. Our method explicitly enforces feasibility with a projection step, thus maintaining material bounds throughout the optimization process. The proposed method is evaluated against SOTA Genetic Algorithm and inverse model baselines on synthetic and in vitro experimental datasets. Our approach demonstrates superior performance on synthetic data, achieving an MAE of 0.06 compared to the baselines' MAE of 0.18 and 0.54, respectively. On experimental specimens, our method outperforms the baseline in 5 out of 6 cases. While our approach requires initial dataset generation and surrogate training, these steps are performed only once, and the actual calibration takes under three seconds. In contrast, traditional calibration time scales linearly with the number of specimens, taking up to 8 days in the worst-case. Such efficiency paves the way for applying more complex FE models, potentially extending beyond IVDs, and enabling accurate patient-specific simulations. △ Less

Submitted 9 December, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: In review. Project code: https://github.com/matanat/IVD-CalibNN/

arXiv:2312.08589 [pdf, other]

Consistent and Asymptotically Unbiased Estimation of Proper Calibration Errors

Authors: Teodora Popordanoska, Sebastian G. Gruber, Aleksei Tiulpin, Florian Buettner, Matthew B. Blaschko

Abstract: Proper scoring rules evaluate the quality of probabilistic predictions, playing an essential role in the pursuit of accurate and well-calibrated models. Every proper score decomposes into two fundamental components -- proper calibration error and refinement -- utilizing a Bregman divergence. While uncertainty calibration has gained significant attention, current literature lacks a general estimato… ▽ More Proper scoring rules evaluate the quality of probabilistic predictions, playing an essential role in the pursuit of accurate and well-calibrated models. Every proper score decomposes into two fundamental components -- proper calibration error and refinement -- utilizing a Bregman divergence. While uncertainty calibration has gained significant attention, current literature lacks a general estimator for these quantities with known statistical properties. To address this gap, we propose a method that allows consistent, and asymptotically unbiased estimation of all proper calibration errors and refinement terms. In particular, we introduce Kullback--Leibler calibration error, induced by the commonly used cross-entropy loss. As part of our results, we prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks, regardless of which proper scoring rule is optimized. Our experiments validate empirically the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2310.05833 [pdf, other]

A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

Authors: Sebastian G. Gruber, Florian Buettner

Abstract: Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc and task-dependent manner. For example, natural language approaches cannot be transferred to image generation. In… ▽ More Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc and task-dependent manner. For example, natural language approaches cannot be transferred to image generation. In this paper, we introduce the first bias-variance-covariance decomposition for kernel scores. This decomposition represents a theoretical framework from which we derive a kernel-based variance and entropy for uncertainty estimation. We propose unbiased and consistent estimators for each quantity which only require generated samples but not the underlying model itself. Based on the wide applicability of kernels, we demonstrate our framework via generalization and uncertainty experiments for image, audio, and language generation. Specifically, kernel entropy for uncertainty estimation is more predictive of performance on CoQA and TriviaQA question answering datasets than existing baselines and can also be applied to closed-source models. △ Less

Submitted 10 July, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Published at ICML 2024: https://openreview.net/pdf?id=QwgSOwynxD

arXiv:2210.12256 [pdf, other]

Uncertainty Estimates of Predictions via a General Bias-Variance Decomposition

Authors: Sebastian G. Gruber, Florian Buettner

Abstract: Reliably estimating the uncertainty of a prediction throughout the model lifecycle is crucial in many safety-critical applications. The most common way to measure this uncertainty is via the predicted confidence. While this tends to work well for in-domain samples, these estimates are unreliable under domain drift and restricted to classification. Alternatively, proper scores can be used for most… ▽ More Reliably estimating the uncertainty of a prediction throughout the model lifecycle is crucial in many safety-critical applications. The most common way to measure this uncertainty is via the predicted confidence. While this tends to work well for in-domain samples, these estimates are unreliable under domain drift and restricted to classification. Alternatively, proper scores can be used for most predictive tasks but a bias-variance decomposition for model uncertainty does not exist in the current literature. In this work we introduce a general bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term. We discover how exponential families and the classification log-likelihood are special cases and provide novel formulations. Surprisingly, we can express the classification case purely in the logit space. We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions. Further, we demonstrate how different approximations of the instance-level Bregman Information allow reliable out-of-distribution detection for all degrees of domain drift. △ Less

Submitted 20 April, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted at AISTATS 2023

arXiv:2203.07835 [pdf, other]

Better Uncertainty Calibration via Proper Scores for Classification and Beyond

Authors: Sebastian G. Gruber, Florian Buettner

Abstract: With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibratio… ▽ More With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification. △ Less

Submitted 12 March, 2024; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: Published at NeurIPS 2022. Corrected conference version Theorem 3.1 and Proposition 3.2 since CWCE=0 does not imply TCE=0

Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2101.09201 [pdf, other]

doi 10.1021/acs.nanolett.9b02351

Mass sensing for the advanced fabrication of nanomechanical resonators

Authors: G. Gruber, C. Urgell, A. Tavernarakis, A. Stavrinadis, S. Tepsic, C. Magen, S. Sangiao, J. M. de Teresa, P. Verlot, A. Bachtold

Abstract: We report on a nanomechanical engineering method to monitor matter growth in real time via e-beam electromechanical coupling. This method relies on the exceptional mass sensing capabilities of nanomechanical resonators. Focused electron beam induced deposition (FEBID) is employed to selectively grow platinum particles at the free end of singly clamped nanotube cantilevers. The electron beam has tw… ▽ More We report on a nanomechanical engineering method to monitor matter growth in real time via e-beam electromechanical coupling. This method relies on the exceptional mass sensing capabilities of nanomechanical resonators. Focused electron beam induced deposition (FEBID) is employed to selectively grow platinum particles at the free end of singly clamped nanotube cantilevers. The electron beam has two functions: it allows both to grow material on the nanotube and to track in real time the deposited mass by probing the noise-driven mechanical resonance of the nanotube. On the one hand, this detection method is highly effective as it can resolve mass deposition with a resolution in the zeptogram range; on the other hand, this method is simple to use and readily available to a wide range of potential users, since it can be operated in existing commercial FEBID systems without making any modification. The presented method allows to engineer hybrid nanomechanical resonators with precisely tailored functionality. It also appears as a new tool for studying growth dynamics of ultra-thin nanostructures, opening new opportunities for investigating so far out-of-reach physics of FEBID and related methods. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: Published in Nano Letters

Journal ref: Nano Letters 19 (2019) 6987-6992

arXiv:2101.09065 [pdf, other]

doi 10.1103/PhysRevLett.126.175502

Interrelation of elasticity and thermal bath in nanotube cantilevers

Authors: S. Tepsic, G. Gruber, C. B. Moller, C. Magen, P. Belardinelli, E. R. Hernadez, F. Alijani, P. Verlot, A. Bachtold

Abstract: We report the first study on the thermal behaviour of the stiffness of individual carbon nanotubes, which is achieved by measuring the resonance frequency of their fundamental mechanical bending modes. We observe a reduction of the Young's modulus over a large temperature range with a slope $-(173\pm 65)$ ppm/K in its relative shift. These findings are reproduced by two different theoretical model… ▽ More We report the first study on the thermal behaviour of the stiffness of individual carbon nanotubes, which is achieved by measuring the resonance frequency of their fundamental mechanical bending modes. We observe a reduction of the Young's modulus over a large temperature range with a slope $-(173\pm 65)$ ppm/K in its relative shift. These findings are reproduced by two different theoretical models based on the thermal dynamics of the lattice. These results reveal how the measured fundamental bending modes depend on the phonons in the nanotube via the Young's modulus. An alternative description based on the coupling between the measured mechanical modes and the phonon thermal bath in the Akhiezer limit is discussed. △ Less

Submitted 23 March, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

Journal ref: Phys. Rev. Lett. 126, 175502 (2021)

arXiv:1801.04692 [pdf]

Analyzing conformational changes in single FRET-labeled A1 parts of archaeal A1AO-ATP synthase

Authors: Hendrik Sielaff, Dhirendra Singh, Gerhard Grueber, Michael Börsch

Abstract: ATP synthases utilize a proton motive force to synthesize ATP. In reverse, these membrane-embedded enzymes can also hydrolyze ATP to pump protons over the membrane. To prevent wasteful ATP hydrolysis, distinct control mechanisms exist for ATP synthases in bacteria, archaea, chloroplasts and mitochondria. Single-molecule Förster resonance energy transfer (smFRET) demonstrated that the C-terminus of… ▽ More ATP synthases utilize a proton motive force to synthesize ATP. In reverse, these membrane-embedded enzymes can also hydrolyze ATP to pump protons over the membrane. To prevent wasteful ATP hydrolysis, distinct control mechanisms exist for ATP synthases in bacteria, archaea, chloroplasts and mitochondria. Single-molecule Förster resonance energy transfer (smFRET) demonstrated that the C-terminus of the rotary subunit epsilon in the Escherichia coli enzyme changes its conformation to block ATP hydrolysis. Previously we investigated the related conformational changes of subunit F of the A1AO-ATP synthase from the archaeon Methanosarcina mazei Gö1. Here, we analyze the lifetimes of fluorescence donor and acceptor dyes to distinguish between smFRET signals for conformational changes and potential artefacts. △ Less

Submitted 15 January, 2018; originally announced January 2018.

Comments: 12 pages, 6 figures

arXiv:1709.08664 [pdf, other]

doi 10.1063/1.4985856

Electrically detected magnetic resonance of carbon dangling bonds at the Si-face 4H-SiC/SiO$_2$ interface

Authors: Gernot Gruber, Jonathon Cottom, Robert Meszaros, Markus Koch, Gregor Pobegen, Thomas Aichinger, Dethard Peters, Peter Hadley

Abstract: SiC based metal-oxide-semiconductor field-effect transistors (MOSFETs) have gained a significant importance in power electronics applications. However, electrically active defects at the SiC/SiO$_2$ interface degrade the ideal behavior of the devices. The relevant microscopic defects can be identified by electron paramagnetic resonance (EPR) or electrically detected magnetic resonance (EDMR). This… ▽ More SiC based metal-oxide-semiconductor field-effect transistors (MOSFETs) have gained a significant importance in power electronics applications. However, electrically active defects at the SiC/SiO$_2$ interface degrade the ideal behavior of the devices. The relevant microscopic defects can be identified by electron paramagnetic resonance (EPR) or electrically detected magnetic resonance (EDMR). This helps to decide which changes to the fabrication process will likely lead to further increases of device performance and reliability. EDMR measurements have shown very similar dominant hyperfine (HF) spectra in differently processed MOSFETs although some discrepancies were observed in the measured $g$-factors. Here, the HF spectra measured of different SiC MOSFETs are compared and it is argued that the same dominant defect is present in all devices. A comparison of the data with simulated spectra of the C dangling bond (P$_\textrm{bC}$) center and the silicon vacancy (V$_\textrm{Si}$) demonstrates that the P$_\textrm{bC}$ center is a more suitable candidate to explain the observed HF spectra. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Comments: Accepted for publication in the Journal of Applied Physics

Showing 1–13 of 13 results for author: Gruber, G