-
Exceedance and force of centrality for functional data
Authors:
Poorbita Kundu,
Hang Zhou,
Hans-Georg Müller
Abstract:
Exceedance refers to instances where a dynamic process surpasses given thresholds, e.g., the occurrence of a heat wave. We propose a novel exceedance framework for functional data, where each observed random trajectory is transformed into an exceedance function, which quantifies exceedance durations as a function of threshold levels. An inherent relationship between exceedance functions and probab…
▽ More
Exceedance refers to instances where a dynamic process surpasses given thresholds, e.g., the occurrence of a heat wave. We propose a novel exceedance framework for functional data, where each observed random trajectory is transformed into an exceedance function, which quantifies exceedance durations as a function of threshold levels. An inherent relationship between exceedance functions and probability distributions makes it possible to draw on distributional data analysis techniques such as Fréchet regression to study the dependence of exceedances on Euclidean predictors, e.g., calendar year when the exceedances are observed. We use local linear estimators to obtain exceedance functions from discretely observed functional data with noise and study the convergence of the proposed estimators. New concepts of interest include the force of centrality that quantifies the propensity of a system to revert to lower levels when a given threshold has been exceeded, conditional exceedance functions when conditioning on Euclidean covariates, and threshold exceedance functions, which characterize the size of exceedance sets in dependence on covariates for any fixed threshold. We establish consistent estimation with rates of convergence for these targets. The practical merits of the proposed methodology are illustrated through simulations and applications for annual temperature curves and medfly activity profiles.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Decomposition-Based Intrinsic Modeling of Shape-Constrained Functional Data
Authors:
Poorbita Kundu,
Hans-Georg Müller
Abstract:
Shape-constrained functional data encompass a wide array of application fields, such as activity profiling, growth curves, healthcare and mortality. Most existing methods for general functional data analysis often ignore that such data are subject to inherent shape constraints, while some specialized techniques rely on strict distributional assumptions. We propose an approach for modeling such dat…
▽ More
Shape-constrained functional data encompass a wide array of application fields, such as activity profiling, growth curves, healthcare and mortality. Most existing methods for general functional data analysis often ignore that such data are subject to inherent shape constraints, while some specialized techniques rely on strict distributional assumptions. We propose an approach for modeling such data that harnesses the intrinsic geometry of functional trajectories by decomposing them into size and shape components. We focus on the two most prevalent shape constraints, positivity and monotonicity, and develop individual-level estimators for the size and shape components. Furthermore, we demonstrate the applicability of our approach by conducting subsequent analyses involving Fréchet mean and Fréchet regression and establish rates of convergence for the empirical estimators. Illustrative examples include simulations and data applications for activity profiles for Mediterranean fruit flies during their entire lifespan and for data from the Zürich longitudinal growth study.
△ Less
Submitted 11 August, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets
Authors:
Ruzhang Zhao,
Prosenjit Kundu,
Arkajyoti Saha,
Nilanjan Chatterjee
Abstract:
Development of comprehensive prediction models are often of great interest in many disciplines of science, but datasets with information on all desired features often have small sample sizes. We describe a transfer learning approach for building high-dimensional generalized linear models using data from a main study with detailed information on all predictors and an external, potentially much larg…
▽ More
Development of comprehensive prediction models are often of great interest in many disciplines of science, but datasets with information on all desired features often have small sample sizes. We describe a transfer learning approach for building high-dimensional generalized linear models using data from a main study with detailed information on all predictors and an external, potentially much larger, study that has ascertained a more limited set of predictors. We propose using the external dataset to build a reduced model and then "transfer" the information on underlying parameters for the analysis of the main study through a set of calibration equations which can account for the study-specific effects of design variables. We then propose a penalized generalized method of moment framework for inference and a one-step estimation method that could be implemented using standard glmnet package. We develop asymptotic theory and conduct extensive simulation studies to investigate both predictive performance and post-selection inference properties of the proposed method. Finally, we illustrate an application of the proposed method for the development of risk models for five common diseases using the UK Biobank study, combining information on low-dimensional risk factors and high throughout proteomic biomarkers.
△ Less
Submitted 17 August, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Analysis of Two-Phase Studies using Generalized Method of Moments
Authors:
Prosenjit Kundu,
Nilanjan Chatterjee
Abstract:
Two-phase design can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting dataset combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods including semiparametric maximum-l…
▽ More
Two-phase design can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting dataset combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for non-random sampling design at phase-II. We use the generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the US National Wilms Tumor study data.
△ Less
Submitted 31 October, 2019; v1 submitted 26 October, 2019;
originally announced October 2019.
-
Generalized Meta-Analysis for Multiple Regression Models Across Studies with Disparate Covariate Information
Authors:
Prosenjit Kundu,
Runlong Tang,
Nilanjan Chatterjee
Abstract:
Meta-analysis, because of both logistical convenience and statistical efficiency, is widely popular for synthesizing information on common parameters of interest across multiple studies. We propose developing a generalized meta-analysis approach for combining information on multivariate regression parameters across multiple different studies which have varying level of covariate information. Using…
▽ More
Meta-analysis, because of both logistical convenience and statistical efficiency, is widely popular for synthesizing information on common parameters of interest across multiple studies. We propose developing a generalized meta-analysis approach for combining information on multivariate regression parameters across multiple different studies which have varying level of covariate information. Using algebraic relationships between regression parameters in different dimensions, we specify a set of moment equations for estimating parameters of a maximal model through information available from sets of parameter estimates from a series of reduced models available from the different studies. The specification of the equations requires a reference dataset to estimate the joint distribution of the covariates. We propose to solve these equations using the generalized method of moments approach, with the optimal weighting of the equations taking into account uncertainty associated with estimates of the parameters of the reduced models. We describe extensions of the iterated reweighted least square algorithm for fitting generalized linear regression models using the proposed framework. Based on the same moment equations, we also propose a diagnostic test for detecting violation of underlying model assumptions, such as those arising due to heterogeneity in the underlying study populations. Methods are illustrated using extensive simulation studies and a real data example involving the development of a breast cancer risk prediction model using disparate risk factor information from multiple studies.
△ Less
Submitted 25 November, 2018; v1 submitted 12 August, 2017;
originally announced August 2017.
-
Some Reliability Properties of Transformed-Transformer Family of Distributions
Authors:
Nil Kamal Hazra,
Pradip Kundu,
Asok K. Nanda
Abstract:
The Transformed-Transformer family of distributions are the resulting family of distributions as transformed from a random variable $T$ through another transformer random variable $X$ using a weight function $ω$ of the cumulative distribution function of $X$. In this paper, we study different stochastic ageing properties, as well as different stochastic orderings of this family of distributions. W…
▽ More
The Transformed-Transformer family of distributions are the resulting family of distributions as transformed from a random variable $T$ through another transformer random variable $X$ using a weight function $ω$ of the cumulative distribution function of $X$. In this paper, we study different stochastic ageing properties, as well as different stochastic orderings of this family of distributions. We discuss the results with several well known distributions.
△ Less
Submitted 17 February, 2016;
originally announced February 2016.
-
A New Class of Probability Distributions for Describing the Spatial Statistics of Area-averaged Rainfall
Authors:
Prasun K. Kundu,
Ravi K. Siddani
Abstract:
Rainfall exhibits extreme variability at many space and time scales and calls for a statistical description. Based on an analysis of radar measurements of precipitation over the tropical oceans, we introduce a new probability law for the area-averaged rain rate constructed from the class of log-infinitely divisible distributions that accurately describes the frequency of the most intense rain even…
▽ More
Rainfall exhibits extreme variability at many space and time scales and calls for a statistical description. Based on an analysis of radar measurements of precipitation over the tropical oceans, we introduce a new probability law for the area-averaged rain rate constructed from the class of log-infinitely divisible distributions that accurately describes the frequency of the most intense rain events. The dependence of its parameters on the spatial averaging length L allows one to relate spatial statistics at different scales. In particular, it enables us to explain the observed power law scaling of the moments of the data and successfully predicts the continuous spectrum of scaling exponents expressing multiscaling characteristics of the rain intensity field.
△ Less
Submitted 19 September, 2015;
originally announced September 2015.
-
Reliability study of a coherent system with single general standby component
Authors:
Pradip Kundu,
Nil Kamal Hazra,
Asok K. Nanda
Abstract:
The properties of a coherent system with a single general standby component is investigated. Here three different switch over viz. perfect switching, imperfect switching and random worm up period of the standby component are considered with some numerical examples.
The properties of a coherent system with a single general standby component is investigated. Here three different switch over viz. perfect switching, imperfect switching and random worm up period of the standby component are considered with some numerical examples.
△ Less
Submitted 12 November, 2020; v1 submitted 22 July, 2015;
originally announced July 2015.