Skip to main content

Showing 1–12 of 12 results for author: Coolen, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2405.13690  [pdf, other

    math.ST cond-mat.dis-nn

    Observable asymptotics of regularized Cox regression models with standard Gaussian designs: a statistical mechanics approach

    Authors: Emanuele Massa, Anthony Coolen

    Abstract: We study the asymptotic behaviour of the Regularized Maximum Partial Likelihood Estimator (RMPLE) in the proportional limit, considering an arbitrary convex regularizer and assuming that the covariates $\mathbf{X}_i\in\mathbb{R}^{p}$ follow a multivariate Gaussian law with covariance $\mathbf{I}_p/p$ for each $i=1, \dots, n$. In order to efficiently compute the estimator under investigation, we pr… ▽ More

    Submitted 6 February, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2312.02870  [pdf, other

    stat.ME cond-mat.dis-nn math.ST

    Replica analysis of overfitting in regression models for time to event data: the impact of censoring

    Authors: Emanuele Massa, Alexander Mozeika, Anthony Coolen

    Abstract: We use statistical mechanics techniques, viz. the replica method, to model the effect of censoring on overfitting in Cox's proportional hazards model, the dominant regression method for time-to-event data. In the overfitting regime, Maximum Likelihood parameter estimators are known to be biased already for small values of the ratio of the number of covariates over the number of samples. The inclus… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  3. arXiv:2209.04270  [pdf, other

    math.ST cond-mat.dis-nn

    Penalization-induced shrinking without rotation in high dimensional GLM regression: a cavity analysis

    Authors: Emanuele Massa, Marianne Jonker, Anthony Coolen

    Abstract: In high dimensional regression, where the number of covariates is of the order of the number of observations, ridge penalization is often used as a remedy against overfitting. Unfortunately, for correlated covariates such regularisation typically induces in generalized linear models not only shrinking of the estimated parameter vector, but also an unwanted \emph{rotation} relative to the true vect… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    MSC Class: 62J07

  4. arXiv:2204.05827  [pdf, other

    stat.ME math.ST physics.data-an

    Correction of overfitting bias in regression models

    Authors: Emanuele Massa, Marianne Jonker, Kit Roes, Anthony Coolen

    Abstract: Regression analysis based on many covariates is becoming increasingly common. However, when the number of covariates $p$ is of the same order as the number of observations $n$, maximum likelihood regression becomes unreliable due to overfitting. This typically leads to systematic estimation biases and increased estimator variances. It is crucial for inference and prediction to quantify these effec… ▽ More

    Submitted 4 September, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: 6 figures, 38 pages including appendices

    MSC Class: 62J99

  5. arXiv:2009.13229  [pdf, ps, other

    math.ST cond-mat.dis-nn

    Exact results on high-dimensional linear regression via statistical physics

    Authors: Alexander Mozeika, Mansoor Sheikh, Fabian Aguirre-Lopez, Fabrizio Antenucci, Anthony CC Coolen

    Abstract: It is clear that conventional statistical inference protocols need to be revised to deal correctly with the high-dimensional data that are now common. Most recent studies aimed at achieving this revision rely on powerful approximation techniques, that call for rigorous results against which they can be tested. In this context, the simplest case of high-dimensional linear regression has acquired si… ▽ More

    Submitted 6 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Most recent version accepted for publication in Physical Review E

    Journal ref: Phys. Rev. E 103, 042142 (2021)

  6. arXiv:2008.00848  [pdf, other

    stat.ME math.ST

    A monotonicity property of weighted log-rank tests

    Authors: Tahani Coolen-Maturi, Frank P. A. Coolen

    Abstract: The logrank test is a well-known nonparametric test which is often used to compare the survival distributions of two samples including right censored observations, it is also known as the Mantel-Haenszel test. The $G^ρ$ family of tests, introduced by Harrington and Fleming (1982), generalizes the logrank test by using weights assigned to observations. In this paper, we present a monotonicity prope… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  7. arXiv:2007.02189  [pdf, other

    math.ST stat.ME

    The joint survival signature of coherent systems with shared components

    Authors: Tahani Coolen-Maturi, Frank P. A. Coolen, Narayanaswamy Balakrishnan

    Abstract: The concept of joint bivariate signature, introduced by Navarro et al. (2013), is a useful tool for quantifying the reliability of two systems with shared components. As with the univariate system signature, introduced by Samaniego (2007), its applications are limited to systems with only one type of components, which restricts its practical use. Coolen and Coolen-Maturi (2012) introduced the surv… ▽ More

    Submitted 11 August, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: 14 pages, 4 figures

  8. arXiv:2004.06329  [pdf, ps, other

    cond-mat.dis-nn math.ST

    Replica analysis of overfitting in generalized linear models

    Authors: ACC Coolen, M Sheikh, A Mozeika, F Aguirre-Lopez, F Antenucci

    Abstract: Nearly all statistical inference methods were developed for the regime where the number $N$ of data samples is much larger than the data dimension $p$. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if $p=O(N)$, due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottlene… ▽ More

    Submitted 8 July, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: 45 pages, 7 figures, accepted for publication in Journal of Physics A

  9. arXiv:1904.06632  [pdf, ps, other

    stat.ME cond-mat.dis-nn cs.LG math.ST stat.ML

    Analysis of overfitting in the regularized Cox model

    Authors: M Sheikh, A. C. C. Coolen

    Abstract: The Cox proportional hazards model is ubiquitous in the analysis of time-to-event data. However, when the data dimension p is comparable to the sample size $N$, maximum likelihood estimates for its regression parameters are known to be biased or break down entirely due to overfitting. This prompted the introduction of the so-called regularized Cox model. In this paper we use the replica method fro… ▽ More

    Submitted 25 July, 2019; v1 submitted 14 April, 2019; originally announced April 2019.

  10. arXiv:1712.09813  [pdf, other

    stat.ME cs.LG math.ST

    Accurate Bayesian Data Classification without Hyperparameter Cross-validation

    Authors: M Sheikh, A C C Coolen

    Abstract: We extend the standard Bayesian multivariate Gaussian generative data classifier by considering a generalization of the conjugate, normal-Wishart prior distribution and by deriving the hyperparameters analytically via evidence maximization. The behaviour of the optimal hyperparameters is explored in the high-dimensional data regime. The classification accuracy of the resulting generalized model is… ▽ More

    Submitted 28 December, 2017; originally announced December 2017.

  11. arXiv:1406.0812  [pdf, other

    math.ST stat.ME

    Covariate dimension reduction for survival data via the Gaussian process latent variable model

    Authors: James E. Barrett, Anthony C. C. Coolen

    Abstract: The analysis of high dimensional survival data is challenging, primarily due to the problem of overfitting which occurs when spurious relationships are inferred from data that subsequently fail to exist in test data. Here we propose a novel method of extracting a low dimensional representation of covariates in survival data by combining the popular Gaussian Process Latent Variable Model (GPLVM) wi… ▽ More

    Submitted 27 January, 2016; v1 submitted 3 June, 2014; originally announced June 2014.

  12. arXiv:1312.1591  [pdf, ps, other

    math.ST stat.ME

    Gaussian process regression for survival data with competing risks

    Authors: James E. Barrett, Anthony C. C. Coolen

    Abstract: We apply Gaussian process (GP) regression, which provides a powerful non-parametric probabilistic method of relating inputs to outputs, to survival data consisting of time-to-event and covariate measurements. In this context, the covariates are regarded as the `inputs' and the event times are the `outputs'. This allows for highly flexible inference of non-linear relationships between covariates an… ▽ More

    Submitted 5 September, 2014; v1 submitted 5 December, 2013; originally announced December 2013.