-
Estimating Value at Risk and Expected Shortfall: A Brief Review and Some New Developments
Authors:
Kanon Kamronnaher,
Andrew Bellucco,
Whitney K. Huang,
Colin M. Gallagher
Abstract:
Value-at-risk (VaR) and expected shortfall (ES) are two commonly utilized metrics for quantifying financial risk. In this study, we review the widely employed Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models. These models are explored with diverse distributional assumptions on innovation, including parametric, non-parametric, and `semi-parametric' that incorporates a parame…
▽ More
Value-at-risk (VaR) and expected shortfall (ES) are two commonly utilized metrics for quantifying financial risk. In this study, we review the widely employed Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models. These models are explored with diverse distributional assumptions on innovation, including parametric, non-parametric, and `semi-parametric' that incorporates a parametric tail distribution based on extreme value theory. Additionally, we introduce a non-parametric local linear quantile autoregression (LLQAR) with kernel weights depending on the distance between the current loss and past losses, and decreasing in the time lag.
To evaluate the performance of different methods for VaR and ES estimation, we employ a multi-criteria approach. This involves mean squared error assessment using simulated data, backtesting on both simulated data and US stocks, and application of the ESBootstrap test. The LLQAR method, which does not necessarily require stationarity assumptions, seems to perform better for simulated non-stationary data as well as real-world data, for estimating VaR and ES.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning
Authors:
Jordan T. Bishop,
Marcus Gallagher
Abstract:
Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts…
▽ More
Learning classifier systems (LCSs) are population-based predictive systems that were originally envisioned as agents to act in reinforcement learning (RL) environments. These systems can suffer from population bloat and so are amenable to compaction techniques that try to strike a balance between population size and performance. A well-studied LCS architecture is XCSF, which in the RL setting acts as a Q-function approximator. We apply XCSF to a deterministic and stochastic variant of the FrozenLake8x8 environment from OpenAI Gym, with its performance compared in terms of function approximation error and policy accuracy to the optimal Q-functions and policies produced by solving the environments via dynamic programming. We then introduce a novel compaction algorithm (Greedy Niche Mass Compaction - GNMC) and study its operation on XCSF's trained populations. Results show that given a suitable parametrisation, GNMC preserves or even slightly improves function approximation error while yielding a significant reduction in population size. Reasonable preservation of policy accuracy also occurs, and we link this metric to the commonly used steps-to-goal metric in maze-like environments, illustrating how the metrics are complementary rather than competitive.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks
Authors:
Russell Tsuchida,
Tim Pearce,
Chris van der Heide,
Fred Roosta,
Marcus Gallagher
Abstract:
Analysing and computing with Gaussian processes arising from infinitely wide neural networks has recently seen a resurgence in popularity. Despite this, many explicit covariance functions of networks with activation functions used in modern networks remain unknown. Furthermore, while the kernels of deep networks can be computed iteratively, theoretical understanding of deep kernels is lacking, par…
▽ More
Analysing and computing with Gaussian processes arising from infinitely wide neural networks has recently seen a resurgence in popularity. Despite this, many explicit covariance functions of networks with activation functions used in modern networks remain unknown. Furthermore, while the kernels of deep networks can be computed iteratively, theoretical understanding of deep kernels is lacking, particularly with respect to fixed-point dynamics. Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian processes on some benchmarks. Secondly, and more generally, we analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions. We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics which are mirrored in finite-width neural networks. The fixed point behaviour present in some networks explains a mechanism for implicit regularisation in overparameterised deep models. Our results relate to both the static iid parameter conjugate kernel and the dynamic neural tangent kernel constructions. Software at github.com/RussellTsuchida/ELU_GELU_kernels.
△ Less
Submitted 28 February, 2021; v1 submitted 19 February, 2020;
originally announced February 2020.
-
Richer priors for infinitely wide multi-layer perceptrons
Authors:
Russell Tsuchida,
Fred Roosta,
Marcus Gallagher
Abstract:
It is well-known that the distribution over functions induced through a zero-mean iid prior distribution over the parameters of a multi-layer perceptron (MLP) converges to a Gaussian process (GP), under mild conditions. We extend this result firstly to independent priors with general zero or non-zero means, and secondly to a family of partially exchangeable priors which generalise iid priors. We d…
▽ More
It is well-known that the distribution over functions induced through a zero-mean iid prior distribution over the parameters of a multi-layer perceptron (MLP) converges to a Gaussian process (GP), under mild conditions. We extend this result firstly to independent priors with general zero or non-zero means, and secondly to a family of partially exchangeable priors which generalise iid priors. We discuss how the second prior arises naturally when considering an equivalence class of functions in an MLP and through training processes such as stochastic gradient descent.
The model resulting from partially exchangeable priors is a GP, with an additional level of inference in the sense that the prior and posterior predictive distributions require marginalisation over hyperparameters. We derive the kernels of the limiting GP in deep MLPs, and show empirically that these kernels avoid certain pathologies present in previously studied priors. We empirically evaluate our claims of convergence by measuring the maximum mean discrepancy between finite width models and limiting models. We compare the performance of our new limiting model to some previously discussed models on synthetic regression problems. We observe increasing ill-conditioning of the marginal likelihood and hyper-posterior as the depth of the model increases, drawing parallels with finite width networks which require notoriously involved optimisation tricks.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.
-
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk
Authors:
Sebastiano Barbieri,
James Kemp,
Oscar Perez-Concha,
Sradha Kotwal,
Martin Gallagher,
Angus Ritchie,
Louisa Jorm
Abstract:
Objective: To compare different deep learning architectures for predicting the risk of readmission within 30 days of discharge from the intensive care unit (ICU). The interpretability of attention-based models is leveraged to describe patients-at-risk. Methods: Several deep learning architectures making use of attention mechanisms, recurrent layers, neural ordinary differential equations (ODEs), a…
▽ More
Objective: To compare different deep learning architectures for predicting the risk of readmission within 30 days of discharge from the intensive care unit (ICU). The interpretability of attention-based models is leveraged to describe patients-at-risk. Methods: Several deep learning architectures making use of attention mechanisms, recurrent layers, neural ordinary differential equations (ODEs), and medical concept embeddings with time-aware attention were trained using publicly available electronic medical record data (MIMIC-III) associated with 45,298 ICU stays for 33,150 patients. Bayesian inference was used to compute the posterior over weights of an attention-based model. Odds ratios associated with an increased risk of readmission were computed for static variables. Diagnoses, procedures, medications, and vital signs were ranked according to the associated risk of readmission. Results: A recurrent neural network, with time dynamics of code embeddings computed by neural ODEs, achieved the highest average precision of 0.331 (AUROC: 0.739, F1-Score: 0.372). Predictive accuracy was comparable across neural network architectures. Groups of patients at risk included those suffering from infectious complications, with chronic or progressive conditions, and for whom standard medical care was not suitable. Conclusions: Attention-based networks may be preferable to recurrent networks if an interpretable model is required, at only marginal cost in predictive accuracy.
△ Less
Submitted 6 January, 2020; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Exchangeability and Kernel Invariance in Trained MLPs
Authors:
Russell Tsuchida,
Fred Roosta,
Marcus Gallagher
Abstract:
In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as SGD. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certa…
▽ More
In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as SGD. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certain instances, the layer-wise kernel of fully-connected layers remains approximately constant during training. We identify a sharp change in the macroscopic behavior of networks as the covariance between weights changes from zero.
△ Less
Submitted 27 October, 2018; v1 submitted 19 October, 2018;
originally announced October 2018.
-
Invariance of Weight Distributions in Rectified MLPs
Authors:
Russell Tsuchida,
Farbod Roosta-Khorasani,
Marcus Gallagher
Abstract:
An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space. We derive the equivale…
▽ More
An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space. We derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions, generalizing a previous result that required Gaussian weight distributions. Additionally, the Central Limit Theorem is used to show that for certain activation functions, kernels corresponding to layers with weight distributions having $0$ mean and finite absolute third moment are asymptotically universal, and are well approximated by the kernel corresponding to layers with spherical Gaussian weights. In deep networks, as depth increases the equivalent kernel approaches a pathological fixed point, which can be used to argue why training randomly initialized networks can be difficult. Our results also have implications for weight initialization.
△ Less
Submitted 31 May, 2018; v1 submitted 24 November, 2017;
originally announced November 2017.
-
MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach
Authors:
Gift Nyamundanda,
Isobel Claire Gormley,
Yue Fan,
William M Gallagher,
Lorraine Brennan
Abstract:
Background: Determining sample sizes for metabolomic experiments is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Since pilot studies are rarely done in metabolomics, currently existing sample size estimation approaches which rely on pilot data can not be applied.
Results: In this article, an analysis…
▽ More
Background: Determining sample sizes for metabolomic experiments is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Since pilot studies are rarely done in metabolomics, currently existing sample size estimation approaches which rely on pilot data can not be applied.
Results: In this article, an analysis based approach called MetSizeR is developed to estimate sample size for metabolomic experiments even when experimental pilot data are not available. The key motivation for MetSizeR is that it considers the type of analysis the researcher intends to use for data analysis when estimating sample size. MetSizeR uses information about the data analysis technique and prior expert knowledge of the metabolomic experiment to simulate pilot data from a statistical model. Permutation based techniques are then applied to the simulated pilot data to estimate the required sample size.
Conclusions: The MetSizeR methodology, and a publicly available software package which implements the approach, are illustrated through real metabolomic applications. Sample size estimates, informed by the intended statistical analysis technique, and the associated uncertainty are provided.
△ Less
Submitted 9 December, 2013;
originally announced December 2013.