-
The bipartite structure of treatment-trial networks reveals the flow of information in network meta-analysis
Authors:
Annabel L Davies
Abstract:
Network meta-analysis (NMA) combines evidence from multiple trials comparing treatment options for the same condition. The method derives its name from a graphical representation of the data where nodes are treatments, and edges represent comparisons between treatments in trials. However, edges in this graph are limited to pairwise comparisons and fail to represent trials that compare more than tw…
▽ More
Network meta-analysis (NMA) combines evidence from multiple trials comparing treatment options for the same condition. The method derives its name from a graphical representation of the data where nodes are treatments, and edges represent comparisons between treatments in trials. However, edges in this graph are limited to pairwise comparisons and fail to represent trials that compare more than two treatments. In this paper, we describe NMA as a bipartite graph where trials define a second type of node. Edges then correspond to the arms of trials, connecting each trial node to the treatment nodes it compares. We consider an NMA model parameterized in terms of the observations in each arm. By linking the hat matrix of this model to the bipartite framework, we reveal how evidence flows through the arms of trials. We then define a random walk on the bipartite graph and propose two conjectures that relate the movement of this walker to evidence flow. We illustrate our methods on a network of treatments for plaque psoriasis and verify our conjectures in simulations on randomly generated graphs. The bipartite framework provides new insights into the evidence structure of NMA and the role of individual trials in producing NMA estimates.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Mapping between measurement scales in meta-analysis, with application to measures of body mass index in children
Authors:
Annabel L Davies,
A E Ades,
Julian PT Higgins
Abstract:
Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To include all evidence in one coherent analysis, we require methods to `map' the outcomes onto a single scale. This is particularly challenging when trials report ag…
▽ More
Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To include all evidence in one coherent analysis, we require methods to `map' the outcomes onto a single scale. This is particularly challenging when trials report aggregate rather than individual data. We are motivated by a meta-analysis of interventions to prevent obesity in children. Trials report aggregate measurements of body mass index (BMI) either expressed as raw values or standardised for age and sex. We develop three methods for mapping between aggregate BMI data using known relationships between individual measurements on different scales. The first is an analytical method based on the mathematical definitions of z-scores and percentiles. The other two approaches involve sampling individual participant data on which to perform the conversions. One method is a straightforward sampling routine, while the other involves optimization with respect to the reported outcomes. In contrast to the analytical approach, these methods also have wider applicability for mapping between any pair of measurement scales with known or estimable individual-level relationships. We verify and contrast our methods using trials from our data set which report outcomes on multiple scales. We find that all methods recreate mean values with reasonable accuracy, but for standard deviations, optimization outperforms the other methods. However, the optimization method is more likely to underestimate standard deviations and is vulnerable to non-convergence.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
An Approximation Based Theory of Linear Regression
Authors:
Laurie Davies
Abstract:
The goal of this paper is to provide a theory linear regression based entirely on approximations. It will be argued that the standard linear regression model based theory whether frequentist or Bayesian has failed and that this failure is due to an 'assumed (revealed?) truth' (John Tukey) attitude to the models. This is reflected in the language of statistical inference which involves a concept of…
▽ More
The goal of this paper is to provide a theory linear regression based entirely on approximations. It will be argued that the standard linear regression model based theory whether frequentist or Bayesian has failed and that this failure is due to an 'assumed (revealed?) truth' (John Tukey) attitude to the models. This is reflected in the language of statistical inference which involves a concept of truth, for example efficiency, consistency and hypothesis testing. The motivation behind this paper was to remove the word `true' from the theory and practice of linear regression and to replace it by approximation. The approximations considered are the least squares approximations. An approximation is called valid if it contains no irrelevant covariates. This is operationalized using the concept of a Gaussian P-value which is the probability that pure Gaussian noise is better in term of least squares than the covariate. The precise definition given in the paper is intuitive and requires only four simple equations. Given this a valid approximation is one where all the Gaussian P-values are less than a threshold $p0$ specified by the statistician, in this paper with the default value 0.01. This approximations approach is not only much simpler it is overwhelmingly better than the standard model based approach. This will be demonstrated using six real data sets, four from high dimensional regression and two from vector autoregression. Both the simplicity and the superiority of Gaussian P-values derive from their universal exactness and validity. This is in complete contrast to standard F P-values which are valid only for carefully designed simulations.
The paper contains excerpts from an unpublished paper by John Tukey entitled `Issues relevant to an honest account of data-based inference partially in the light of Laurie Davies's paper'.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
A complex meta-regression model to identify effective features of interventions from multi-arm, multi-follow-up trials
Authors:
Annabel L Davies,
Julian P T Higgins
Abstract:
Network meta-analysis (NMA) combines evidence from multiple trials to compare the effectiveness of a set of interventions. In public health research, interventions are often complex, made up of multiple components or features. This makes it difficult to define a common set of interventions on which to perform the analysis. One approach to this problem is component network meta-analysis (CNMA) whic…
▽ More
Network meta-analysis (NMA) combines evidence from multiple trials to compare the effectiveness of a set of interventions. In public health research, interventions are often complex, made up of multiple components or features. This makes it difficult to define a common set of interventions on which to perform the analysis. One approach to this problem is component network meta-analysis (CNMA) which uses a meta-regression framework to define each intervention as a subset of components whose individual effects combine additively. In this paper, we are motivated by a systematic review of complex interventions to prevent obesity in children. Due to considerable heterogeneity across the trials, these interventions cannot be expressed as a subset of components but instead are coded against a framework of characteristic features. To analyse these data, we develop a bespoke CNMA-inspired model that allows us to identify the most important features of interventions. We define a meta-regression model with covariates on three levels: intervention, study, and follow-up time, as well as flexible interaction terms. By specifying different regression structures for trials with and without a control arm, we relax the assumption from previous CNMA models that a control arm is the absence of intervention components. Furthermore, we derive a correlation structure that accounts for trials with multiple intervention arms and multiple follow-up times. Although our model was developed for the specifics of the obesity data set, it has wider applicability to any set of complex interventions that can be coded according to a set of shared features.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Shortest path or random walks? A framework for path weights in network meta-analysis
Authors:
Gerta Rücker,
Theodoros Papakonstantinou,
Adriani Nikolakopoulou,
Guido Schwarzer,
Tobias Galla,
Annabel L. Davies
Abstract:
Quantifying the contributions, or weights, of comparisons or single studies to the estimates in a network meta-analysis (NMA) is an active area of research. We extend this to the contributions of paths to NMA estimates. We present a general framework, based on the path-design matrix, that describes the problem of finding path contributions as a linear equation. The resulting solutions may have neg…
▽ More
Quantifying the contributions, or weights, of comparisons or single studies to the estimates in a network meta-analysis (NMA) is an active area of research. We extend this to the contributions of paths to NMA estimates. We present a general framework, based on the path-design matrix, that describes the problem of finding path contributions as a linear equation. The resulting solutions may have negative coefficients. We show that two known approaches, called shortestpath and randomwalk, are special solutions of this equation, and both meet an optimization criterion, as they minimize the sum of absolute path contributions. In general, there is an infinite space of solutions, which can be identified using the generalized inverse (Moore-Penrose pseudoinverse). We consider two further special approaches. For complex networks we find that shortestpath is superior with respect to run time and variability, compared to the other approaches, and is thus recommended in practice. The path-weights framework also has the potential to answer more general research questions in network meta-analysis.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Transport Reversible Jump Proposals
Authors:
Laurence Davies,
Robert Salomone,
Matthew Sutton,
Christopher Drovandi
Abstract:
Reversible jump Markov chain Monte Carlo (RJMCMC) proposals that achieve reasonable acceptance rates and mixing are notoriously difficult to design in most applications. Inspired by recent advances in deep neural network-based normalizing flows and density estimation, we demonstrate an approach to enhance the efficiency of RJMCMC sampling by performing transdimensional jumps involving reference di…
▽ More
Reversible jump Markov chain Monte Carlo (RJMCMC) proposals that achieve reasonable acceptance rates and mixing are notoriously difficult to design in most applications. Inspired by recent advances in deep neural network-based normalizing flows and density estimation, we demonstrate an approach to enhance the efficiency of RJMCMC sampling by performing transdimensional jumps involving reference distributions. In contrast to other RJMCMC proposals, the proposed method is the first to apply a non-linear transport-based approach to construct efficient proposals between models with complicated dependency structures. It is shown that, in the setting where exact transports are used, our RJMCMC proposals have the desirable property that the acceptance probability depends only on the model probabilities. Numerical experiments demonstrate the efficacy of the approach.
△ Less
Submitted 24 February, 2023; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities
Authors:
Laurie Davies,
Lutz Dümbgen
Abstract:
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is give…
▽ More
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data making it model-free free. The covariate selection procedures require only a cut-off value $α$ for the Gaussian P-value: the default value in this paper is $α=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the step-wise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.
△ Less
Submitted 23 February, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Linear Regression, Covariate Selection and the Failure of Modelling
Authors:
Laurie Davies
Abstract:
It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required…
▽ More
It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on seven data sets and three simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed.
In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The P-value based approach collapses. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets.
The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.
△ Less
Submitted 22 February, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Retarded kernels for longitudinal survival analysis and dynamic prediction
Authors:
Annabel L. Davies,
Anthony C. C. Coolen,
Tobias Galla
Abstract:
Predicting patient survival probabilities based on observed covariates is an important assessment in clinical practice. These patient-specific covariates are often measured over multiple follow-up appointments. It is then of interest to predict survival based on the history of these longitudinal measurements, and to update predictions as more observations become available. The standard approaches…
▽ More
Predicting patient survival probabilities based on observed covariates is an important assessment in clinical practice. These patient-specific covariates are often measured over multiple follow-up appointments. It is then of interest to predict survival based on the history of these longitudinal measurements, and to update predictions as more observations become available. The standard approaches to these so-called `dynamic prediction' assessments are joint models and landmark analysis. Joint models involve high-dimensional parametrisations, and their computational complexity often prohibits including multiple longitudinal covariates. Landmark analysis is simpler, but discards a proportion of the available data at each `landmark time'. In this work we propose a `retarded kernel' approach to dynamic prediction that sits somewhere in between the two standard methods in terms of complexity. By conditioning hazard rates directly on the covariate measurements over the observation time frame, we define a model that takes into account the full history of covariate measurements but is more practical and parsimonious than joint modelling. Time-dependent association kernels describe the impact of covariate changes at earlier times on the patient's hazard rate at later times. Under the constraints that our model (i) reduces to the standard Cox model for time-independent covariates, and (ii) contains the instantaneous Cox model as a special case, we derive two natural kernel parameterisations. Upon application to three clinical data sets, we find that the predictive accuracy of the retarded kernel approach is comparable to that of the two existing standard methods.
△ Less
Submitted 10 November, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Bayesian Detectability of Induced Polarisation in Airborne Electromagnetic Data using Reversible Jump Sequential Monte Carlo
Authors:
Laurence Davies,
Alan Yusen Ley-Cooper,
Matthew Sutton,
Christopher Drovandi
Abstract:
Detection of induced polarisation (IP) effects in airborne electromagnetic (AEM) measurements does not yet have an established methodology. This contribution develops a Bayesian approach to the IP-detectability problem using decoupled transdimensional layered models, and applies an approach novel to geophysics whereby transdimensional proposals are used within the embarrassingly parallelisable and…
▽ More
Detection of induced polarisation (IP) effects in airborne electromagnetic (AEM) measurements does not yet have an established methodology. This contribution develops a Bayesian approach to the IP-detectability problem using decoupled transdimensional layered models, and applies an approach novel to geophysics whereby transdimensional proposals are used within the embarrassingly parallelisable and robust static Sequential Monte Carlo (SMC) class of algorithms for the simultaneous inference of parameters and models. Henceforth referring to this algorithm as Reversible Jump Sequential Monte Carlo (RJSMC), the statistical methodological contributions to the algorithm account for adaptivity considerations for multiple models and proposal types, especially surrounding particle impoverishment in unlikely models. Methodological contributions to solid Earth geophysics include the decoupled model approach and proposal of a statistic that use posterior model odds for IP detectability. A case study is included investigating detectability of IP effects in AEM data at a broad scale.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Network meta-analysis and random walks
Authors:
Annabel L. Davies,
Theodoros Papakonstantinou,
Adriani Nikolakopoulou,
Gerta Rücker,
Tobias Galla
Abstract:
Network meta-analysis (NMA) is a central tool for evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the proportion contributions of each direct treatment comparison to each network treatment effect. The construction of proportion contributions is based on t…
▽ More
Network meta-analysis (NMA) is a central tool for evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the proportion contributions of each direct treatment comparison to each network treatment effect. The construction of proportion contributions is based on the observation that each row of the hat matrix represents a so-called 'evidence flow network' for each treatment comparison. However, the existing algorithm used to calculate these values is associated with ambiguity according to the selection of paths. In this work we present a novel analogy between NMA and random walks. We use this analogy to derive closed-form expressions for the proportion contributions. A random walk on a graph is a stochastic process that describes a succession of random 'hops' between vertices which are connected by an edge. The weight of an edge relates to the probability that the walker moves along that edge. We use the graph representation of NMA to construct the transition matrix for a random walk on the network of evidence. We show that the net number of times a walker crosses each edge of the network is related to the evidence flow network. By then defining a random walk on the directed evidence flow network, we derive analytically the matrix of proportion contributions. The random-walk approach, in addition to being computationally more efficient, has none of the associated ambiguity of the existing algorithm.
△ Less
Submitted 14 June, 2021;
originally announced July 2021.
-
Excess deaths, baselines, Z-scores, P-scores and peaks
Authors:
Laurie Davies
Abstract:
The recent Covid-19 epidemic has lead to comparisons of the countries suffering from it. These are based on the number of excess deaths attributed either directly or indirectly to the epidemic. Unfortunately the data on which such comparisons rely are often incomplete and unreliable. This article discusses problems of interpretation of data even when the data is largely accurate and delayed by at…
▽ More
The recent Covid-19 epidemic has lead to comparisons of the countries suffering from it. These are based on the number of excess deaths attributed either directly or indirectly to the epidemic. Unfortunately the data on which such comparisons rely are often incomplete and unreliable. This article discusses problems of interpretation of data even when the data is largely accurate and delayed by at most two to three weeks. This applies to the Office of National Statistics in the UK, the Statistisches Bundesamt in Germany and the Belgian statistical office Statbel. The data in the article is taken from these three sources. The number of excess deaths is defined as the number of deaths minus the baseline, the definition of which varies from country to country. In the UK it is the average number of deaths over the last five years, in Germany it is over the last four years and in Belgium over the last 11 years. This means that in all cases the individual baselines depend strongly on the timing and intensity of adverse factors such as past influenza epidemics and heat waves. This makes cross-country comparisons difficult. A baseline defined as the number the number of deaths in the absence of adverse factors can be operationalized by taking say the 10\% quantile of the number of deaths. This varies little over time and European countries within given age groups. It therefore enables more robust and accurate comparisons of different countries. The article criticizes the use of Z-scores which distort the comparison between countries. Finally the problem of describing past epidemics by their timing, that is start and finish and time of the maximum, and by their effect, the height of the maximum and the total number of deaths, is considered.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Degree irregularity and rank probability bias in network meta-analysis
Authors:
Annabel L Davies,
Tobias Galla
Abstract:
Network meta-analysis (NMA) is a statistical technique for the comparison of treatment options. The nodes of the network are the competing treatments and edges represent comparisons of treatments in trials. Outcomes of Bayesian NMA include estimates of treatment effects, and the probabilities that each treatment is ranked best, second best and so on. How exactly network geometry affects the accura…
▽ More
Network meta-analysis (NMA) is a statistical technique for the comparison of treatment options. The nodes of the network are the competing treatments and edges represent comparisons of treatments in trials. Outcomes of Bayesian NMA include estimates of treatment effects, and the probabilities that each treatment is ranked best, second best and so on. How exactly network geometry affects the accuracy and precision of these outcomes is not fully understood. Here we carry out a simulation study and find that disparity in the number of trials involving different treatments leads to a systematic bias in estimated rank probabilities. This bias is associated with an increased variation in the precision of treatment effect estimates. Using ideas from the theory of complex networks, we define a measure of `degree irregularity' to quantify asymmetry in the number of studies involving each treatment. Our simulations indicate that more regular networks have more precise treatment effect estimates and smaller bias of rank probabilities. We also find that degree regularity is a better indicator of NMA quality than both the total number of studies in a network and the disparity in the number of trials per comparison. These results have implications for planning future trials. We demonstrate that choosing trials which reduce the network's irregularity can improve the precision and accuracy of NMA outcomes.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Covariate Selection Based on a Assumpton-free Approach to Linear Regression with Exact Probabilities
Authors:
Laurie Davies,
Lutz Dümbgen
Abstract:
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is give…
▽ More
In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data. The covariate selection procedures based on this require only a cut-off value $α$ for the Gaussian P-value: the default value in this paper is $α=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the stepwise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.
△ Less
Submitted 8 February, 2023; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Lasso, knockoff and Gaussian covariates: a comparison
Authors:
Laurie Davies
Abstract:
Given data $\mathbf{y}$ and $k$ covariates $\mathbf{x}_j$ one problem in linear regression is to decide which if any of the covariates to include when regressing the dependent variable $\mathbf{y}$ on the covariates $\mathbf{x}_j$. In this paper three such methods, lasso, knockoff and Gaussian covariates are compared using simulations and real data. The Gaussian covariate method is based on exact…
▽ More
Given data $\mathbf{y}$ and $k$ covariates $\mathbf{x}_j$ one problem in linear regression is to decide which if any of the covariates to include when regressing the dependent variable $\mathbf{y}$ on the covariates $\mathbf{x}_j$. In this paper three such methods, lasso, knockoff and Gaussian covariates are compared using simulations and real data. The Gaussian covariate method is based on exact probabilities which are valid for all $\mathbf{y}$ and $\mathbf{x}_j$ making it model free. Moreover the probabilities agree with those based on the F-distribution for the standard linear model with i.i.d. Gaussian errors. It is conceptually, mathematically and algorithmically very simple, it is very fast and makes no use of simulations. It outperforms lasso and knockoff in all respects by a considerable margin.
△ Less
Submitted 30 March, 2019; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Statistical Analysis of the Ricker Model
Authors:
Laurie Davies
Abstract:
The Ricker model was introduced in the context of managing fishing stocks. It is a discrete non-linear iterative model given by $N(t+1)=rN(t)\exp(-N(t))$ where $N(t)$ is the population at time $t$. The model treated in this paper includes a random component $N(t+1)=rN(t)\exp(-N(t)+\varepsilon(t+1))$ and what is observed at time $t$ is a Poisson random variable with parameter $\varphi N(t)$. Such a…
▽ More
The Ricker model was introduced in the context of managing fishing stocks. It is a discrete non-linear iterative model given by $N(t+1)=rN(t)\exp(-N(t))$ where $N(t)$ is the population at time $t$. The model treated in this paper includes a random component $N(t+1)=rN(t)\exp(-N(t)+\varepsilon(t+1))$ and what is observed at time $t$ is a Poisson random variable with parameter $\varphi N(t)$. Such a model has been analysed using `synthetic likelihood' and ABC (Approximate Bayesian Computation). In contrast this paper takes a non-likelihood approach and treats the model in a consistent manner as an approximation. The goal is to specify those parameter values if any which are consistent with the data.
△ Less
Submitted 7 March, 2017;
originally announced March 2017.
-
Stylized Facts and Simulating Long Range Financial Data
Authors:
Laurie Davies,
Walter Krämer
Abstract:
We propose a new method (implemented in an R-program) to simulate long-range daily stock-price data. The program reproduces various stylized facts much better than various parametric models from the extended GARCH-family. In particular, the empirically observed changes in unconditional variance are truthfully mirrored in the simulated data.
We propose a new method (implemented in an R-program) to simulate long-range daily stock-price data. The program reproduces various stylized facts much better than various parametric models from the extended GARCH-family. In particular, the empirically observed changes in unconditional variance are truthfully mirrored in the simulated data.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
On $p$-values
Authors:
Laurie Davies
Abstract:
Models are consistently treated as approximations and all procedures are consistent with this. They do not treat the model as being true. In this context $p$-values are one measure of approximation, a small $p$-value indicating a poor approximation. Approximation regions are defined and distinguished from confidence regions.
Models are consistently treated as approximations and all procedures are consistent with this. They do not treat the model as being true. In this context $p$-values are one measure of approximation, a small $p$-value indicating a poor approximation. Approximation regions are defined and distinguished from confidence regions.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
Stepwise Choice of Covariates in High Dimensional Regression
Authors:
Laurie Davies
Abstract:
Given data y(n) and p(n)covariates x(n) one problem in linear regression is to decide which if any of the covariates to include. There are many articles on this problem but all are based on a stochastic model for the data. This paper gives what seems to be a new approach which does not require any form of model. It is conceptually and algorithmically simple and consistency results can be proved un…
▽ More
Given data y(n) and p(n)covariates x(n) one problem in linear regression is to decide which if any of the covariates to include. There are many articles on this problem but all are based on a stochastic model for the data. This paper gives what seems to be a new approach which does not require any form of model. It is conceptually and algorithmically simple and consistency results can be proved under appropriate assumptions.
△ Less
Submitted 5 October, 2017; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Unsupervised nonparametric detection of unknown objects in noisy images based on percolation theory
Authors:
Mikhail A. Langovoy,
Olaf Wittich,
Patrick Laurie Davies
Abstract:
We develop an unsupervised, nonparametric, and scalable statistical learning method for detection of unknown objects in noisy images. The method uses results from percolation theory and random graph theory. We present an algorithm that allows to detect objects of unknown shapes and sizes in the presence of nonparametric noise of unknown level. The noise density is assumed to be unknown and can be…
▽ More
We develop an unsupervised, nonparametric, and scalable statistical learning method for detection of unknown objects in noisy images. The method uses results from percolation theory and random graph theory. We present an algorithm that allows to detect objects of unknown shapes and sizes in the presence of nonparametric noise of unknown level. The noise density is assumed to be unknown and can be very irregular. The algorithm has linear complexity and exponential accuracy and is appropriate for real-time systems. We prove strong consistency and scalability of our method in this setup with minimal assumptions.
△ Less
Submitted 12 July, 2018; v1 submitted 24 February, 2011;
originally announced February 2011.
-
Locally adaptive image denoising by a statistical multiresolution criterion
Authors:
Thomas Hotz,
Philipp Marnitz,
Rahel Stichtenoth,
Laurie Davies,
Zakhar Kabluchko,
Axel Munk
Abstract:
We demonstrate how one can choose the smoothing parameter in image denoising by a statistical multiresolution criterion, both globally and locally. Using inhomogeneous diffusion and total variation regularization as examples for localized regularization schemes, we present an efficient method for locally adaptive image denoising. As expected, the smoothing parameter serves as an edge detector in…
▽ More
We demonstrate how one can choose the smoothing parameter in image denoising by a statistical multiresolution criterion, both globally and locally. Using inhomogeneous diffusion and total variation regularization as examples for localized regularization schemes, we present an efficient method for locally adaptive image denoising. As expected, the smoothing parameter serves as an edge detector in this framework. Numerical examples illustrate the usefulness of our approach. We also present an application in confocal microscopy.
△ Less
Submitted 29 January, 2010;
originally announced January 2010.
-
Reversible jump Markov chain Monte Carlo and multi-model samplers
Authors:
Yanan Fan,
Scott A. Sisson,
Laurence Davies
Abstract:
To appear in the second edition of the MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall.
To appear in the second edition of the MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall.
△ Less
Submitted 28 August, 2024; v1 submitted 12 January, 2010;
originally announced January 2010.
-
Approximating Data with weighted smoothing Splines
Authors:
P. L. Davies,
M. Meise
Abstract:
Given a data set (t_i, y_i), i=1,..., n with the t_i in [0,1] non-parametric regression is concerned with the problem of specifying a suitable function f_n:[0,1] -> R such that the data can be reasonably approximated by the points (t_i, f_n(t_i)), i=1,..., n. If a data set exhibits large variations in local behaviour, for example large peaks as in spectroscopy data, then the method must be able…
▽ More
Given a data set (t_i, y_i), i=1,..., n with the t_i in [0,1] non-parametric regression is concerned with the problem of specifying a suitable function f_n:[0,1] -> R such that the data can be reasonably approximated by the points (t_i, f_n(t_i)), i=1,..., n. If a data set exhibits large variations in local behaviour, for example large peaks as in spectroscopy data, then the method must be able to adapt to the local changes in smoothness. Whilst many methods are able to accomplish this they are less successful at adapting derivatives. In this paper we show how the goal of local adaptivity of the function and its first and second derivatives can be attained in a simple manner using weighted smoothing splines. A residual based concept of approximation is used which forces local adaptivity of the regression function together with a global regularization which makes the function as smooth as possible subject to the approximation constraints.
△ Less
Submitted 18 March, 2009; v1 submitted 11 December, 2007;
originally announced December 2007.
-
Residual-based localization and quantification of peaks in x-ray diffractograms
Authors:
P. L. Davies,
U. Gather,
M. Meise,
D. Mergel,
T. Mildenberger
Abstract:
We consider data consisting of photon counts of diffracted x-ray radiation as a function of the angle of diffraction. The problem is to determine the positions, powers and shapes of the relevant peaks. An additional difficulty is that the power of the peaks is to be measured from a baseline which itself must be identified. Most methods of de-noising data of this kind do not explicitly take into…
▽ More
We consider data consisting of photon counts of diffracted x-ray radiation as a function of the angle of diffraction. The problem is to determine the positions, powers and shapes of the relevant peaks. An additional difficulty is that the power of the peaks is to be measured from a baseline which itself must be identified. Most methods of de-noising data of this kind do not explicitly take into account the modality of the final estimate. The residual-based procedure we propose uses the so-called taut string method, which minimizes the number of peaks subject to a tube constraint on the integrated data. The baseline is identified by combining the result of the taut string with an estimate of the first derivative of the baseline obtained using a weighted smoothing spline. Finally, each individual peak is expressed as the finite sum of kernels chosen from a parametric family.
△ Less
Submitted 11 November, 2008; v1 submitted 23 November, 2007;
originally announced November 2007.
-
Nonparametric Regression, Confidence Regions and Regularization
Authors:
P. L. Davies,
A. Kovac,
M. Meise
Abstract:
In this paper we offer a unified approach to the problem of nonparametric regression on the unit interval. It is based on a universal, honest and non-asymptotic confidence region which is defined by a set of linear inequalities involving the values of the functions at the design points. Interest will typically centre on certain simplest functions in that region where simplicity can be defined in…
▽ More
In this paper we offer a unified approach to the problem of nonparametric regression on the unit interval. It is based on a universal, honest and non-asymptotic confidence region which is defined by a set of linear inequalities involving the values of the functions at the design points. Interest will typically centre on certain simplest functions in that region where simplicity can be defined in terms of shape (number of local extremes, intervals of convexity/concavity) or smoothness (bounds on derivatives) or a combination of both. Once some form of regularization has been decided upon the confidence region can be used to provide honest non-asymptotic confidence bounds which are less informative but conceptually much simpler.
△ Less
Submitted 5 November, 2007;
originally announced November 2007.