-
Nonparametric inference for Poisson-Laguerre tessellations
Authors:
Thomas van der Jagt,
Geurt Jongbloed,
Martina Vittorietti
Abstract:
In this paper, we consider statistical inference for Poisson-Laguerre tessellations in $\mathbb{R}^d$. The object of interest is a distribution function $F$ which uniquely determines the intensity measure of the underlying Poisson process. Two nonparametric estimators for $F$ are introduced which depend only on the points of the Poisson process which generate non-empty cells and the actual cells c…
▽ More
In this paper, we consider statistical inference for Poisson-Laguerre tessellations in $\mathbb{R}^d$. The object of interest is a distribution function $F$ which uniquely determines the intensity measure of the underlying Poisson process. Two nonparametric estimators for $F$ are introduced which depend only on the points of the Poisson process which generate non-empty cells and the actual cells corresponding to these points. The proposed estimators are proven to be strongly consistent, as the observation window expands unboundedly to the whole space. We also consider a stereological setting, where one is interested in estimating the distribution function associated with the Poisson process of a higher dimensional Poisson-Laguerre tessellation, given that a corresponding sectional Poisson-Laguerre tessellation is observed.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Existence and approximation of densities of chord length- and cross section area distributions
Authors:
Thomas van der Jagt,
Geurt Jongbloed,
Martina Vittorietti
Abstract:
In various stereological problems an $n$-dimensional convex body is intersected with an $(n-1)$-dimensional Isotropic Uniformly Random (IUR) hyperplane. In this paper the cumulative distribution function associated with the $(n-1)$-dimensional volume of such a random section is studied. This distribution is also known as chord length distribution and cross section area distribution in the planar a…
▽ More
In various stereological problems an $n$-dimensional convex body is intersected with an $(n-1)$-dimensional Isotropic Uniformly Random (IUR) hyperplane. In this paper the cumulative distribution function associated with the $(n-1)$-dimensional volume of such a random section is studied. This distribution is also known as chord length distribution and cross section area distribution in the planar and spatial case respectively. For various classes of convex bodies it is shown that these distribution functions are absolutely continuous with respect to Lebesgue measure. A Monte Carlo simulation scheme is proposed for approximating the corresponding probability density functions.
△ Less
Submitted 20 June, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Stereological determination of particle size distributions for similar convex bodies
Authors:
Thomas van der Jagt,
Geurt Jongbloed,
Martina Vittorietti
Abstract:
Consider an opaque medium which contains 3D particles. All particles are convex bodies of the same shape, but they vary in size. The particles are randomly positioned and oriented within the medium and cannot be observed directly. Taking a planar section of the medium we obtain a sample of observed 2D section profile areas of the intersected particles. In this paper the distribution of interest is…
▽ More
Consider an opaque medium which contains 3D particles. All particles are convex bodies of the same shape, but they vary in size. The particles are randomly positioned and oriented within the medium and cannot be observed directly. Taking a planar section of the medium we obtain a sample of observed 2D section profile areas of the intersected particles. In this paper the distribution of interest is the underlying 3D particle size distribution for which an identifiability result is obtained. Moreover, a nonparametric estimator is proposed for this size distribution. The estimator is proven to be consistent and its performance is assessed in a simulation study.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Testing for no effect in regression problems: a permutation approach
Authors:
Michał Ciszewski,
Jakob Söhl,
Ton Leenen,
Bart van Trigt,
Geurt Jongbloed
Abstract:
Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is bett…
▽ More
Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of $(X_i,Y_j)$. It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.
△ Less
Submitted 26 April, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Statistical Integration of Heterogeneous Data with PO2PLS
Authors:
Said el Bouhaddani,
Hae-Won Uh,
Geurt Jongbloed,
Jeanine Houwing-Duistermaat
Abstract:
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic…
▽ More
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Interpretable random forest models through forward variable selection
Authors:
Jasper Velthoen,
Juan-Juan Cai,
Geurt Jongbloed
Abstract:
Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure lead…
▽ More
Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent.
In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Isotonic regression for metallic microstructure data: estimation and testing under order restrictions
Authors:
Martina Vittorietti,
Javier Hidalgo,
Jilt Sietsma,
Wei Li,
Geurt Jongbloed
Abstract:
Investigating the main determinants of the mechanical performance of metals is not a simple task. Already known physical inspired qualitative relations between 2D microstructure characteristics and 3D mechanical properties can act as the starting point of the investigation. Isotonic regression allows to take into account ordering relations and leads to more efficient and accurate results when the…
▽ More
Investigating the main determinants of the mechanical performance of metals is not a simple task. Already known physical inspired qualitative relations between 2D microstructure characteristics and 3D mechanical properties can act as the starting point of the investigation. Isotonic regression allows to take into account ordering relations and leads to more efficient and accurate results when the underlying assumptions actually hold. The main goal in this paper is to test order relations in a model inspired by a materials science application. The statistical estimation procedure is described considering three different scenarios according to the knowledge of the variances: known variance ratio, completely unknown variances, variances under order restrictions. New likelihood ratio tests are developed in the last two cases. Both parametric and non-parametric bootstrap approaches are developed for finding the distribution of the test statistics under the null hypothesis. Finally an application on the relation between Geometrically Necessary Dislocations and number of observed microstructure precipitations is shown.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
General framework for testing Poisson-Voronoi assumption for real microstructures
Authors:
Martina Vittorietti,
Piet J. J. Kok,
Jilt Sietsma,
Wei Li,
Geurt Jongbloed
Abstract:
Modeling microstructures is an interesting problem not just in Materials Science but also in Mathematics and Statistics. The most basic model for steel microstructure is the Poisson-Voronoi diagram. It has mathematically attractive properties and it has been used in the approximation of single phase steel microstructures. The aim of this paper is to develop methods that can be used to test whether…
▽ More
Modeling microstructures is an interesting problem not just in Materials Science but also in Mathematics and Statistics. The most basic model for steel microstructure is the Poisson-Voronoi diagram. It has mathematically attractive properties and it has been used in the approximation of single phase steel microstructures. The aim of this paper is to develop methods that can be used to test whether a real steel microstructure can be approximated by such a model. Therefore, a general framework for testing the Poisson-Voronoi assumption based on images of 2D sections of real metals is set out. Following two different approaches, according to the use or not of periodic boundary conditions, three different model tests are proposed. The first two are based on the coefficient of variation and the cumulative distribution function of the cells area. The third exploits tools from to Topological Data Analysis, such as persistence landscapes.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Improving precipitation forecasts using extreme quantile regression
Authors:
Jasper Velthoen,
Juan-Juan Cai,
Geurt Jongbloed,
Maurice Schmeits
Abstract:
Aiming to estimate extreme precipitation forecast quantiles, we propose a nonparametric regression model that features a constant extreme value index. Using local linear quantile regression and an extrapolation technique from extreme value theory, we develop an estimator for conditional quantiles corresponding to extreme high probability levels. We establish uniform consistency and asymptotic norm…
▽ More
Aiming to estimate extreme precipitation forecast quantiles, we propose a nonparametric regression model that features a constant extreme value index. Using local linear quantile regression and an extrapolation technique from extreme value theory, we develop an estimator for conditional quantiles corresponding to extreme high probability levels. We establish uniform consistency and asymptotic normality of the estimators. In a simulation study, we examine the performance of our estimator on finite samples in comparison with a method assuming linear quantiles. On a precipitation data set in the Netherlands, these estimators have greater predictive skill compared to the upper member of ensemble forecasts provided by a numerical weather prediction model.
△ Less
Submitted 5 March, 2019; v1 submitted 14 June, 2018;
originally announced June 2018.
-
Probabilistic partial least squares model: Identifiability, estimation and application
Authors:
Said el Bouhaddani,
Hae-Won Uh,
Caroline Hayward,
Geurt Jongbloed,
Jeanine Houwing-Duistermaat
Abstract:
With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be i…
▽ More
With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.
△ Less
Submitted 5 June, 2018; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Accurate approximation of the distributions of the 3D Poisson-Voronoi typical cell geometrical features
Authors:
Martina Vittorietti,
Geurt Jongbloed,
Piet J. J. Kok,
Jilt Sietsma
Abstract:
Although Poisson-Voronoi diagrams have interesting mathematical properties, there is still much to discover about the geometrical properties of its grains. Through simulations, many authors were able to obtain numerical approximations of the moments of the distributions of more or less all geometrical characteristics of the grain. Furthermore, many proposals on how to get close parametric approxim…
▽ More
Although Poisson-Voronoi diagrams have interesting mathematical properties, there is still much to discover about the geometrical properties of its grains. Through simulations, many authors were able to obtain numerical approximations of the moments of the distributions of more or less all geometrical characteristics of the grain. Furthermore, many proposals on how to get close parametric approximations to the real distributions were put forward by several authors. In this paper we show that exploiting the scaling property of the underlying Poisson process, we are able to derive the distribution of the main geometrical features of the grain for every value of the intensity parameter. Moreover, we use a sophisticated simulation program to construct a close Monte Carlo based approximation for the distributions of interest. Using this, we also determine the closest approximating distributions within the mentioned frequently used parametric classes of distributions and conclude that these approximations can be quite accurate.
△ Less
Submitted 18 May, 2017;
originally announced May 2017.
-
Nonparametric inference in a stereological model with oriented cylinders applied to dual phase steel
Authors:
K. S. McGarrity,
J. Sietsma,
G. Jongbloed
Abstract:
Oriented circular cylinders in an opaque medium are used to represent certain microstructural objects in steel. The opaque medium is sliced parallel to the cylinder axes of symmetry and the cut-plane contains the observable rectangular profiles of the cylinders. A one-to-one relation between the joint density of the squared radius and height of the 3D cylinders and the joint density of the squared…
▽ More
Oriented circular cylinders in an opaque medium are used to represent certain microstructural objects in steel. The opaque medium is sliced parallel to the cylinder axes of symmetry and the cut-plane contains the observable rectangular profiles of the cylinders. A one-to-one relation between the joint density of the squared radius and height of the 3D cylinders and the joint density of the squared half-width and height of the observable 2D rectangles is established. We propose a nonparametric estimation procedure to estimate the distributions and expectations of various quantities of interest, such as the cylinder radius, height, aspect ratio, surface area and volume from the observed 2D rectangle widths and heights. Also, the covariance between the radius and height of a cylinder is estimated. The asymptotic behavior of these estimators is established to yield point-wise confidence intervals for the expectations and point-wise confidence sets for the distributions of the quantities of interest. Many of these quantities can be linked to the mechanical properties of the material, and are, therefore, useful for industry. We illustrate the mathematical model and estimation procedures using a banded microstructure for which nearly 90 \textmu m of depth have been observed via serial sectioning.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
A maximum smoothed likelihood estimator in the current status continuous mark model
Authors:
Piet Groeneboom,
Geurt Jongbloed,
Birgit Witte
Abstract:
We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable based on censored data. More specifically, the event time is subject to current status censoring and the continuous mark is only observed in case inspection takes place after the event time. The nonparametric maximum likelihood estimator (MLE) in this model is known to be inconsis…
▽ More
We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable based on censored data. More specifically, the event time is subject to current status censoring and the continuous mark is only observed in case inspection takes place after the event time. The nonparametric maximum likelihood estimator (MLE) in this model is known to be inconsistent. We propose and study an alternative likelihood based estimator, maximizing a smoothed log-likelihood, hence called a maximum smoothed likelihood estimator (MSLE). This estimator is shown to be well defined and consistent, and a simple algorithm is described that can be used to compute it. The MSLE is compared with other estimators in a small simulation study.
△ Less
Submitted 6 September, 2011;
originally announced September 2011.