-
Confidence Sequences for Generalized Linear Models via Regret Analysis
Authors:
Eugenio Clerico,
Hamish Flynn,
Wojciech Kotłowski,
Gergely Neu
Abstract:
We develop a methodology for constructing confidence sets for parameters of statistical models via a reduction to sequential prediction. Our key observation is that for any generalized linear model (GLM), one can construct an associated game of sequential probability assignment such that achieving low regret in the game implies a high-probability upper bound on the excess likelihood of the true pa…
▽ More
We develop a methodology for constructing confidence sets for parameters of statistical models via a reduction to sequential prediction. Our key observation is that for any generalized linear model (GLM), one can construct an associated game of sequential probability assignment such that achieving low regret in the game implies a high-probability upper bound on the excess likelihood of the true parameter of the GLM. This allows us to develop a scheme that we call online-to-confidence-set conversions, which effectively reduces the problem of proving the desired statistical claim to an algorithmic question. We study two varieties of this conversion scheme: 1) analytical conversions that only require proving the existence of algorithms with low regret and provide confidence sets centered at the maximum-likelihood estimator 2) algorithmic conversions that actively leverage the output of the online algorithm to construct confidence sets (and may be centered at other, adaptively constructed point estimators). The resulting methodology recovers all state-of-the-art confidence set constructions within a single framework, and also provides several new types of confidence sets that were previously unknown in the literature.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
How good is PAC-Bayes at explaining generalisation?
Authors:
Antoine Picard-Weibel,
Eugenio Clerico,
Roman Moscoviz,
Benjamin Guedj
Abstract:
We discuss necessary conditions for a PAC-Bayes bound to provide a meaningful generalisation guarantee. Our analysis reveals that the optimal generalisation guarantee depends solely on the distribution of the risk induced by the prior distribution. In particular, achieving a target generalisation level is only achievable if the prior places sufficient mass on high-performing predictors. We relate…
▽ More
We discuss necessary conditions for a PAC-Bayes bound to provide a meaningful generalisation guarantee. Our analysis reveals that the optimal generalisation guarantee depends solely on the distribution of the risk induced by the prior distribution. In particular, achieving a target generalisation level is only achievable if the prior places sufficient mass on high-performing predictors. We relate these requirements to the prevalent practice of using data-dependent priors in deep learning PAC-Bayes applications, and discuss the implications for the claim that PAC-Bayes ``explains'' generalisation.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
On the optimality of coin-betting for mean estimation
Authors:
Eugenio Clerico
Abstract:
Confidence sequences are sequences of confidence sets that adapt to incoming data while maintaining validity. Recent advances have introduced an algorithmic formulation for constructing some of the tightest confidence sequences for bounded real random variables. These approaches use a coin-betting framework, where a player sequentially bets on differences between potential mean values and observed…
▽ More
Confidence sequences are sequences of confidence sets that adapt to incoming data while maintaining validity. Recent advances have introduced an algorithmic formulation for constructing some of the tightest confidence sequences for bounded real random variables. These approaches use a coin-betting framework, where a player sequentially bets on differences between potential mean values and observed data. This letter establishes that such coin-betting formulation is optimal among all possible algorithmic frameworks for constructing confidence sequences that build on e-variables and sequential hypothesis testing.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Online-to-PAC generalization bounds under graph-mixing dependencies
Authors:
Baptiste Abélès,
Eugenio Clerico,
Gergely Neu
Abstract:
Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing) dependencies, or graph-dependencies, where non-adjacent vertices correspond to independent random variables. Both approaches have their own limitations, the forme…
▽ More
Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing) dependencies, or graph-dependencies, where non-adjacent vertices correspond to independent random variables. Both approaches have their own limitations, the former requiring a temporal ordered structure, and the latter lacking a way to quantify the strength of inter-dependencies. In this work, we bridge these two lines of work by proposing a framework where dependencies decay with graph distance. We derive generalization bounds leveraging the online-to-PAC framework, by deriving a concentration result and introducing an online learning framework incorporating the graph structure. The resulting high-probability generalization guarantees depend on both the mixing rate and the graph's chromatic number.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
A note on regularised NTK dynamics with an application to PAC-Bayesian training
Authors:
Eugenio Clerico,
Benjamin Guedj
Abstract:
We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, alt…
▽ More
We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Generalisation under gradient descent via deterministic PAC-Bayes
Authors:
Eugenio Clerico,
Tyler Farghly,
George Deligiannidis,
Benjamin Guedj,
Arnaud Doucet
Abstract:
We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution…
▽ More
We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
△ Less
Submitted 11 February, 2025; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Chained Generalisation Bounds
Authors:
Eugenio Clerico,
Amitis Shidani,
George Deligiannidis,
Arnaud Doucet
Abstract:
This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the…
▽ More
This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the loss onto its gradient. This allows us to re-derive the chaining mutual information bound from the literature, and to obtain novel chained information-theoretic generalisation bounds, based on the Wasserstein distance and other probability metrics. We show on some toy examples that the chained generalisation bound can be significantly tighter than its standard counterpart, particularly when the distribution of the hypotheses selected by the algorithm is very concentrated.
Keywords: Generalisation bounds; Chaining; Information-theoretic bounds; Mutual information; Wasserstein distance; PAC-Bayes.
△ Less
Submitted 30 June, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Conditionally Gaussian PAC-Bayes
Authors:
Eugenio Clerico,
George Deligiannidis,
Arnaud Doucet
Abstract:
Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper prop…
▽ More
Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss. Empirical results show that this approach outperforms currently available PAC-Bayesian training methods.
△ Less
Submitted 24 February, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Wide stochastic networks: Gaussian limit and PAC-Bayesian training
Authors:
Eugenio Clerico,
George Deligiannidis,
Arnaud Doucet
Abstract:
The limit of infinite width allows for substantial simplifications in the analytical study of over-parameterised neural networks. With a suitable random initialisation, an extremely large network exhibits an approximately Gaussian behaviour. In the present work, we establish a similar result for a simple stochastic architecture whose parameters are random variables, holding both before and during…
▽ More
The limit of infinite width allows for substantial simplifications in the analytical study of over-parameterised neural networks. With a suitable random initialisation, an extremely large network exhibits an approximately Gaussian behaviour. In the present work, we establish a similar result for a simple stochastic architecture whose parameters are random variables, holding both before and during training. The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimises the generalisation bound. For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.
△ Less
Submitted 13 February, 2023; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Stable ResNet
Authors:
Soufiane Hayou,
Eugenio Clerico,
Bobby He,
George Deligiannidis,
Arnaud Doucet,
Judith Rousseau
Abstract:
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introd…
▽ More
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introduce a new class of ResNet architectures, called Stable ResNet, that have the property of stabilizing the gradient while ensuring expressivity in the infinite depth limit.
△ Less
Submitted 18 March, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.