-
Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework
Authors:
Subhasis Dasgupta,
Arshi Naaz,
Jayeeta Choudhury,
Nancy Lahiri
Abstract:
Road accidents are quite common in almost every part of the world, and, in majority, fatal accidents are attributed to over speeding of vehicles. The tendency to over speeding is usually tried to be controlled using check points at various parts of the road but not all traffic police have the device to check speed with existing speed estimating devices such as LIDAR based, or Radar based guns. The…
▽ More
Road accidents are quite common in almost every part of the world, and, in majority, fatal accidents are attributed to over speeding of vehicles. The tendency to over speeding is usually tried to be controlled using check points at various parts of the road but not all traffic police have the device to check speed with existing speed estimating devices such as LIDAR based, or Radar based guns. The current project tries to address the issue of vehicle speed estimation with handheld devices such as mobile phones or wearable cameras with network connection to estimate the speed using deep learning frameworks.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
On optimal block resampling for Gaussian-subordinated long-range dependent processes
Authors:
Qihao Zhang,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
Block-based resampling estimators have been intensively investigated for weakly dependent time processes, which has helped to inform implementation (e.g., best block sizes). However, little is known about resampling performance and block sizes under strong or long-range dependence. To establish guideposts in block selection, we consider a broad class of strongly dependent time processes, formed by…
▽ More
Block-based resampling estimators have been intensively investigated for weakly dependent time processes, which has helped to inform implementation (e.g., best block sizes). However, little is known about resampling performance and block sizes under strong or long-range dependence. To establish guideposts in block selection, we consider a broad class of strongly dependent time processes, formed by a transformation of a stationary long-memory Gaussian series, and examine block-based resampling estimators for the variance of the prototypical sample mean; extensions to general statistical functionals are also considered. Unlike weak dependence, the properties of resampling estimators under strong dependence are shown to depend intricately on the nature of non-linearity in the time series (beyond Hermite ranks) in addition the long-memory coefficient and block size. Additionally, the intuition has often been that optimal block sizes should be larger under strong dependence (say $O(n^{1/2})$ for a sample size $n$) than the optimal order $O(n^{1/3})$ known under weak dependence. This intuition turns out to be largely incorrect, though a block order $O(n^{1/2})$ may be reasonable (and even optimal) in many cases, owing to non-linearity in a long-memory time series. While optimal block sizes are more complex under long-range dependence compared to short-range, we provide a consistent data-driven rule for block selection, and numerical studies illustrate that the guides for block selection perform well in other block-based problems with long-memory time series, such as distribution estimation and strategies for testing Hermite rank.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Two-sample Testing on Latent Distance Graphs With Unknown Link Functions
Authors:
Yiran Wang,
Minh Tang,
Soumendra Nath Lahiri
Abstract:
We propose a valid and consistent test for the hypothesis that two latent distance random graphs on the same vertex set have the same generating latent positions, up to some unidentifiable similarity transformations. Our test statistic is based on first estimating the edge probabilities matrices by truncating the singular value decompositions of the averaged adjacency matrices in each population a…
▽ More
We propose a valid and consistent test for the hypothesis that two latent distance random graphs on the same vertex set have the same generating latent positions, up to some unidentifiable similarity transformations. Our test statistic is based on first estimating the edge probabilities matrices by truncating the singular value decompositions of the averaged adjacency matrices in each population and then computing a Spearman rank correlation coefficient between these estimates. Experimental results on simulated data indicate that the test procedure has power even when there is only one sample from each population, provided that the number of vertices is not too small. Application on a dataset of neural connectome graphs showed that we can distinguish between scans from different age groups while application on a dataset of epileptogenic recordings showed that we can discriminate between seizure and non-seizure events.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Higher Order Refinements by Bootstrap in Lasso and other Penalized Regression Methods
Authors:
Debraj Das,
Arindam Chatterjee,
S. N. Lahiri
Abstract:
Selection of important covariates and to drop the unimportant ones from a high-dimensional regression model is a long standing problem and hence have received lots of attention in the last two decades. After selecting the correct model, it is also important to properly estimate the existing parameters corresponding to important covariates. In this spirit, Fan and Li (2001) proposed Oracle property…
▽ More
Selection of important covariates and to drop the unimportant ones from a high-dimensional regression model is a long standing problem and hence have received lots of attention in the last two decades. After selecting the correct model, it is also important to properly estimate the existing parameters corresponding to important covariates. In this spirit, Fan and Li (2001) proposed Oracle property as a desired feature of a variable selection method. Oracle property has two parts; one is the variable selection consistency (VSC) and the other one is the asymptotic normality. Keeping VSC fixed and making the other part stronger, Fan and Lv (2008) introduced the strong oracle property. In this paper, we consider different penalized regression techniques which are VSC and classify those based on oracle and strong oracle property. We show that both the residual and the perturbation bootstrap methods are second order correct for any penalized estimator irrespective of its class. Most interesting of all is the Lasso, introduced by Tibshirani (1996). Although Lasso is VSC, it is not asymptotically normal and hence fails to satisfy the oracle property.
△ Less
Submitted 14 September, 2019;
originally announced September 2019.
-
On Statistical Properties of A Veracity Scoring Method for Spatial Data
Authors:
Arnab Chakraborty,
Soumendra N. Lahiri
Abstract:
Measuring veracity or reliability of noisy data is of utmost importance, especially in the scenarios where the information are gathered through automated systems. In a recent paper, Chakraborty et. al. (2019) have introduced a veracity scoring technique for geostatistical data. The authors have used a high-quality `reference' data to measure the veracity of the varying-quality observations and inc…
▽ More
Measuring veracity or reliability of noisy data is of utmost importance, especially in the scenarios where the information are gathered through automated systems. In a recent paper, Chakraborty et. al. (2019) have introduced a veracity scoring technique for geostatistical data. The authors have used a high-quality `reference' data to measure the veracity of the varying-quality observations and incorporated the veracity scores in their analysis of mobile-sensor generated noisy weather data to generate efficient predictions of the ambient temperature process. In this paper, we consider the scenario when no reference data is available and hence, the veracity scores (referred as VS) are defined based on `local' summaries of the observations. We develop a VS-based estimation method for parameters of a spatial regression model. Under a non-stationary noise structure and fairly general assumptions on the underlying spatial process, we show that the VS-based estimators of the regression parameters are consistent. Moreover, we establish the advantage of the VS-based estimators as compared to the ordinary least squares (OLS) estimator by analyzing their asymptotic mean squared errors. We illustrate the merits of the VS-based technique through simulations and apply the methodology to a real data set on mass percentages of ash in coal seams in Pennsylvania.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
A Statistical Analysis of Noisy Crowdsourced Weather Data
Authors:
Arnab Chakraborty,
Soumendra Nath Lahiri,
Alyson Wilson
Abstract:
Spatial prediction of weather-elements like temperature, precipitation, and barometric pressure are generally based on satellite imagery or data collected at ground-stations. None of these data provide information at a more granular or "hyper-local" resolution. On the other hand, crowdsourced weather data, which are captured by sensors installed on mobile devices and gathered by weather-related mo…
▽ More
Spatial prediction of weather-elements like temperature, precipitation, and barometric pressure are generally based on satellite imagery or data collected at ground-stations. None of these data provide information at a more granular or "hyper-local" resolution. On the other hand, crowdsourced weather data, which are captured by sensors installed on mobile devices and gathered by weather-related mobile apps like WeatherSignal and AccuWeather, can serve as potential data sources for analyzing environmental processes at a hyper-local resolution. However, due to the low quality of the sensors and the non-laboratory environment, the quality of the observations in crowdsourced data is compromised. This paper describes methods to improve hyper-local spatial prediction using this varying-quality noisy crowdsourced information. We introduce a reliability metric, namely Veracity Score (VS), to assess the quality of the crowdsourced observations using a coarser, but high-quality, reference data. A VS-based methodology to analyze noisy spatial data is proposed and evaluated through extensive simulations. The merits of the proposed approach are illustrated through case studies analyzing crowdsourced daily average ambient temperature readings for one day in the contiguous United States.
△ Less
Submitted 24 June, 2019; v1 submitted 16 February, 2019;
originally announced February 2019.
-
Simulating Markov random fields with a conclique-based Gibbs sampler
Authors:
Andee Kaplan,
Mark S. Kaiser,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
For spatial and network data, we consider models formed from a Markov random field (MRF) structure and the specification of a conditional distribution for each observation. Fast simulation from such MRF models is often an important consideration, particularly when repeated generation of large numbers of data sets is required. However, a standard Gibbs strategy for simulating from MRF models involv…
▽ More
For spatial and network data, we consider models formed from a Markov random field (MRF) structure and the specification of a conditional distribution for each observation. Fast simulation from such MRF models is often an important consideration, particularly when repeated generation of large numbers of data sets is required. However, a standard Gibbs strategy for simulating from MRF models involves single-site updates, performed with the conditional univariate distribution of each observation in a sequential manner, whereby a complete Gibbs iteration may become computationally involved even for moderate samples. As an alternative, we describe a general way to simulate from MRF models using Gibbs sampling with "concliques" (i.e., groups of non-neighboring observations). Compared to standard Gibbs sampling, this simulation scheme can be much faster by reducing Gibbs steps and independently updating all observations per conclique at once. The speed improvement depends on the number of concliques relative to the sample size for simulation, and order-of-magnitude speed increases are possible with many MRF models (e.g., having appropriately bounded neighborhoods). We detail the simulation method, establish its validity, and assess its computational performance through numerical studies, where speed advantages are shown for several spatial and network examples.
△ Less
Submitted 10 September, 2019; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Distributional Consistency of Lasso by Perturbation Bootstrap
Authors:
Debraj Das,
S. N. Lahiri
Abstract:
Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some regression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the…
▽ More
Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some regression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the distribution of the Lasso in heteroscedastic linear regression. We allow the underlying covariates to be either random or non-random. We show that the proposed bootstrap method works irrespective of the nature of the covariates, unlike the resample-based bootstrap of Freedman (1981) which must be tailored based on the nature (random vs non-random) of the covariates. Simulation study also justifies our method in finite samples.
△ Less
Submitted 29 October, 2017;
originally announced October 2017.
-
Perturbation Bootstrap in Adaptive Lasso
Authors:
Debraj Das,
Karl Gregory,
S. N. Lahiri
Abstract:
The Adaptive Lasso(Alasso) was proposed by Zou [\textit{J. Amer. Statist. Assoc. \textbf{101} (2006) 1418-1429}] as a modification of the Lasso for the purpose of simultaneous variable selection and estimation of the parameters in a linear regression model. Zou (2006) established that the Alasso estimator is variable-selection consistent as well as asymptotically Normal in the indices correspondin…
▽ More
The Adaptive Lasso(Alasso) was proposed by Zou [\textit{J. Amer. Statist. Assoc. \textbf{101} (2006) 1418-1429}] as a modification of the Lasso for the purpose of simultaneous variable selection and estimation of the parameters in a linear regression model. Zou (2006) established that the Alasso estimator is variable-selection consistent as well as asymptotically Normal in the indices corresponding to the nonzero regression coefficients in certain fixed-dimensional settings. In an influential paper, Minnier, Tian and Cai [\textit{J. Amer. Statist. Assoc. \textbf{106} (2011) 1371-1382}] proposed a perturbation bootstrap method and established its distributional consistency for the Alasso estimator in the fixed-dimensional setting. In this paper, however, we show that this (naive) perturbation bootstrap fails to achieve second order correctness in approximating the distribution of the Alasso estimator. We propose a modification to the perturbation bootstrap objective function and show that a suitably studentized version of our modified perturbation bootstrap Alasso estimator achieves second-order correctness even when the dimension of the model is allowed to grow to infinity with the sample size. As a consequence, inferences based on the modified perturbation bootstrap will be more accurate than the inferences based on the oracle Normal approximation. We give simulation studies demonstrating good finite-sample properties of our modified perturbation bootstrap method as well as an illustration of our method on a real data set.
△ Less
Submitted 14 February, 2018; v1 submitted 9 March, 2017;
originally announced March 2017.
-
Two Stage Non-penalized Corrected Least Squares for High Dimensional Linear Models with Measurement error or Missing Covariates
Authors:
Abhishek Kaul,
Hira L. Koul,
Akshita Chawla,
Soumendra N. Lahiri
Abstract:
This paper provides an alternative to penalized estimators for estimation and vari- able selection in high dimensional linear regression models with measurement error or missing covariates. We propose estimation via bias corrected least squares after model selection. We show that by separating model selection and estimation, it is possible to achieve an improved rate of convergence of the L2 estim…
▽ More
This paper provides an alternative to penalized estimators for estimation and vari- able selection in high dimensional linear regression models with measurement error or missing covariates. We propose estimation via bias corrected least squares after model selection. We show that by separating model selection and estimation, it is possible to achieve an improved rate of convergence of the L2 estimation error compared to the rate sqrt{s log p/n} achieved by simultaneous estimation and variable selection methods such as L1 penalized corrected least squares. If the correct model is selected with high probability then the L2 rate of convergence for the proposed method is indeed the oracle rate of sqrt{s/n}. Here s, p are the number of non zero parameters and the model dimension, respectively, and n is the sample size. Under very general model selection criteria, the proposed method is computationally simpler and statistically at least as efficient as the L1 penalized corrected least squares method, performs model selection without the availability of the bias correction matrix, and is able to provide estimates with only a small sub-block of the bias correction covariance matrix of order s x s in comparison to the p x p correction matrix required for computation of the L1 penalized version. Furthermore we show that the model selection requirements are met by a correlation screening type method and the L1 penalized corrected least squares method. Also, the proposed methodology when applied to the estimation of precision matrices with missing observations, is seen to perform at least as well as existing L1 penalty based methods. All results are supported empirically by a simulation study.
△ Less
Submitted 10 May, 2016;
originally announced May 2016.
-
Second Order Correctness of Perturbation Bootstrap M-Estimator of Multiple Linear Regression Parameter
Authors:
Debraj Das,
Soumendra Nath Lahiri
Abstract:
Consider the multiple linear regression model $y_{i} = \boldsymbol{x}'_{i} \boldsymbolβ + ε_{i}$, where $ε_i$'s are independent and identically distributed random variables, $\mathbf{x}_i$'s are known design vectors and $\boldsymbolβ$ is the $p \times 1$ vector of parameters. An effective way of approximating the distribution of the M-estimator $\boldsymbol{\barβ}_n$, after proper centering and sc…
▽ More
Consider the multiple linear regression model $y_{i} = \boldsymbol{x}'_{i} \boldsymbolβ + ε_{i}$, where $ε_i$'s are independent and identically distributed random variables, $\mathbf{x}_i$'s are known design vectors and $\boldsymbolβ$ is the $p \times 1$ vector of parameters. An effective way of approximating the distribution of the M-estimator $\boldsymbol{\barβ}_n$, after proper centering and scaling, is the Perturbation Bootstrap Method. In this current work, second order results of this non-naive bootstrap method have been investigated. Second order correctness is important for reducing the approximation error uniformly to $o(n^{-1/2})$ to get better inferences. We show that the classical studentized version of the bootstrapped estimator fails to be second order correct. We introduce an innovative modification in the studentized version of the bootstrapped statistic and show that the modified bootstrapped pivot is second order correct (S.O.C.) for approximating the distribution of the studentized M-estimator. Additionally, we show that the Perturbation Bootstrap continues to be S.O.C. when the errors $ε_i$'s are independent, but may not be identically distributed. These findings establish perturbation Bootstrap approximation as a significant improvement over asymptotic normality in the regression M-estimation.
△ Less
Submitted 17 December, 2017; v1 submitted 4 May, 2016;
originally announced May 2016.
-
Central limit theorems for long range dependent spatial linear processes
Authors:
S. N. Lahiri,
Peter M. Robinson
Abstract:
Central limit theorems are established for the sum, over a spatial region, of observations from a linear process on a $d$-dimensional lattice. This region need not be rectangular, but can be irregularly-shaped. Separate results are established for the cases of positive strong dependence, short range dependence, and negative dependence. We provide approximations to asymptotic variances that reveal…
▽ More
Central limit theorems are established for the sum, over a spatial region, of observations from a linear process on a $d$-dimensional lattice. This region need not be rectangular, but can be irregularly-shaped. Separate results are established for the cases of positive strong dependence, short range dependence, and negative dependence. We provide approximations to asymptotic variances that reveal differential rates of convergence under the three types of dependence. Further, in contrast to the one dimensional (i.e., the time series) case, it is shown that the form of the asymptotic variance in dimensions $d>1$ critically depends on the geometry of the sampling region under positive strong dependence and under negative dependence and that there can be non-trivial edge-effects under negative dependence for $d>1$. Precise conditions for the presence of edge effects are also given.
△ Less
Submitted 6 January, 2016;
originally announced January 2016.
-
A frequency domain empirical likelihood method for irregularly spaced spatial data
Authors:
Soutir Bandyopadhyay,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
This paper develops empirical likelihood methodology for irregularly spaced spatial data in the frequency domain. Unlike the frequency domain empirical likelihood (FDEL) methodology for time series (on a regular grid), the formulation of the spatial FDEL needs special care due to lack of the usual orthogonality properties of the discrete Fourier transform for irregularly spaced data and due to pre…
▽ More
This paper develops empirical likelihood methodology for irregularly spaced spatial data in the frequency domain. Unlike the frequency domain empirical likelihood (FDEL) methodology for time series (on a regular grid), the formulation of the spatial FDEL needs special care due to lack of the usual orthogonality properties of the discrete Fourier transform for irregularly spaced data and due to presence of nontrivial bias in the periodogram under different spatial asymptotic structures. A spatial FDEL is formulated in the paper taking into account the effects of these factors. The main results of the paper show that Wilks' phenomenon holds for a scaled version of the logarithm of the proposed empirical likelihood ratio statistic in the sense that it is asymptotically distribution-free and has a chi-squared limit. As a result, the proposed spatial FDEL method can be used to build nonparametric, asymptotically correct confidence regions and tests for covariance parameters that are defined through spectral estimating equations, for irregularly spaced spatial data. In comparison to the more common studentization approach, a major advantage of our method is that it does not require explicit estimation of the standard error of an estimator, which is itself a very difficult problem as the asymptotic variances of many common estimators depend on intricate interactions among several population quantities, including the spectral density of the spatial process, the spatial sampling density and the spatial asymptotic structure. Results from a numerical study are also reported to illustrate the methodology and its finite sample properties.
△ Less
Submitted 17 March, 2015;
originally announced March 2015.
-
Convergence rates of empirical block length selectors for block bootstrap
Authors:
Daniel J. Nordman,
Soumendra N. Lahiri
Abstract:
We investigate the accuracy of two general non-parametric methods for estimating optimal block lengths for block bootstraps with time series - the first proposed in the seminal paper of Hall, Horowitz and Jing (Biometrika 82 (1995) 561-574) and the second from Lahiri et al. (Stat. Methodol. 4 (2007) 292-321). The relative performances of these general methods have been unknown and, to provide a co…
▽ More
We investigate the accuracy of two general non-parametric methods for estimating optimal block lengths for block bootstraps with time series - the first proposed in the seminal paper of Hall, Horowitz and Jing (Biometrika 82 (1995) 561-574) and the second from Lahiri et al. (Stat. Methodol. 4 (2007) 292-321). The relative performances of these general methods have been unknown and, to provide a comparison, we focus on rates of convergence for these block length selectors for the moving block bootstrap (MBB) with variance estimation problems under the smooth function model. It is shown that, with suitable choice of tuning parameters, the optimal convergence rate of the first method is $O_p(n^{-1/6})$ where $n$ denotes the sample size. The optimal convergence rate of the second method, with the same number of tuning parameters, is shown to be $O_p(n^{-2/7})$, suggesting that the second method may generally have better large-sample properties for block selection in block bootstrap applications beyond variance estimation. We also compare the two general methods with other plug-in methods specifically designed for block selection in variance estimation, where the best possible convergence rate is shown to be $O_p(n^{-1/3})$ and achieved by a method from Politis and White (Econometric Rev. 23 (2004) 53-70).
△ Less
Submitted 13 March, 2014;
originally announced March 2014.
-
A nonstandard empirical likelihood for time series
Authors:
Daniel J. Nordman,
Helle Bunzel,
Soumendra N. Lahiri
Abstract:
Standard blockwise empirical likelihood (BEL) for stationary, weakly dependent time series requires specifying a fixed block length as a tuning parameter for setting confidence regions. This aspect can be difficult and impacts coverage accuracy. As an alternative, this paper proposes a new version of BEL based on a simple, though nonstandard, data-blocking rule which uses a data block of every pos…
▽ More
Standard blockwise empirical likelihood (BEL) for stationary, weakly dependent time series requires specifying a fixed block length as a tuning parameter for setting confidence regions. This aspect can be difficult and impacts coverage accuracy. As an alternative, this paper proposes a new version of BEL based on a simple, though nonstandard, data-blocking rule which uses a data block of every possible length. Consequently, the method does not involve the usual block selection issues and is also anticipated to exhibit better coverage performance. Its nonstandard blocking scheme, however, induces nonstandard asymptotics and requires a significantly different development compared to standard BEL. We establish the large-sample distribution of log-ratio statistics from the new BEL method for calibrating confidence regions for mean or smooth function parameters of time series. This limit law is not the usual chi-square one, but is distribution-free and can be reproduced through straightforward simulations. Numerical studies indicate that the proposed method generally exhibits better coverage accuracy than standard BEL.
△ Less
Submitted 6 January, 2014;
originally announced January 2014.
-
Rates of convergence of the Adaptive LASSO estimators to the Oracle distribution and higher order refinements by the bootstrap
Authors:
A. Chatterjee,
S. N. Lahiri
Abstract:
Zou [J. Amer. Statist. Assoc. 101 (2006) 1418-1429] proposed the Adaptive LASSO (ALASSO) method for simultaneous variable selection and estimation of the regression parameters, and established its oracle property. In this paper, we investigate the rate of convergence of the ALASSO estimator to the oracle distribution when the dimension of the regression parameters may grow to infinity with the sam…
▽ More
Zou [J. Amer. Statist. Assoc. 101 (2006) 1418-1429] proposed the Adaptive LASSO (ALASSO) method for simultaneous variable selection and estimation of the regression parameters, and established its oracle property. In this paper, we investigate the rate of convergence of the ALASSO estimator to the oracle distribution when the dimension of the regression parameters may grow to infinity with the sample size. It is shown that the rate critically depends on the choices of the penalty parameter and the initial estimator, among other factors, and that confidence intervals (CIs) based on the oracle limit law often have poor coverage accuracy. As an alternative, we consider the residual bootstrap method for the ALASSO estimators that has been recently shown to be consistent; cf. Chatterjee and Lahiri [J. Amer. Statist. Assoc. 106 (2011a) 608-625]. We show that the bootstrap applied to a suitable studentized version of the ALASSO estimator achieves second-order correctness, even when the dimension of the regression parameters is unbounded. Results from a moderately large simulation study show marked improvement in coverage accuracy for the bootstrap CIs over the oracle based CIs.
△ Less
Submitted 8 July, 2013;
originally announced July 2013.
-
A penalized empirical likelihood method in high dimensions
Authors:
Soumendra N. Lahiri,
Subhodeep Mukhopadhyay
Abstract:
This paper formulates a penalized empirical likelihood (PEL) method for inference on the population mean when the dimension of the observations may grow faster than the sample size. Asymptotic distributions of the PEL ratio statistic is derived under different component-wise dependence structures of the observations, namely, (i) non-Ergodic, (ii) long-range dependence and (iii) short-range depende…
▽ More
This paper formulates a penalized empirical likelihood (PEL) method for inference on the population mean when the dimension of the observations may grow faster than the sample size. Asymptotic distributions of the PEL ratio statistic is derived under different component-wise dependence structures of the observations, namely, (i) non-Ergodic, (ii) long-range dependence and (iii) short-range dependence. It follows that the limit distribution of the proposed PEL ratio statistic can vary widely depending on the correlation structure, and it is typically different from the usual chi-squared limit of the empirical likelihood ratio statistic in the fixed and finite dimensional case. A unified subsampling based calibration is proposed, and its validity is established in all three cases, (i)-(iii). Finite sample properties of the method are investigated through a simulation study.
△ Less
Submitted 27 February, 2013; v1 submitted 13 February, 2013;
originally announced February 2013.
-
Gap bootstrap methods for massive data sets with an application to transportation engineering
Authors:
S. N. Lahiri,
C. Spiegelman,
J. Appiah,
L. Rilett
Abstract:
In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of si…
▽ More
In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from Transportation Engineering, which motivated the development of the proposed methods.
△ Less
Submitted 11 January, 2013;
originally announced January 2013.
-
Goodness of fit tests for a class of Markov random field models
Authors:
Mark S. Kaiser,
Soumendra N. Lahiri,
Daniel J. Nordman
Abstract:
This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model s…
▽ More
This paper develops goodness of fit statistics that can be used to formally assess Markov random field models for spatial data, when the model distributions are discrete or continuous and potentially parametric. Test statistics are formed from generalized spatial residuals which are collected over groups of nonneighboring spatial observations, called concliques. Under a hypothesized Markov model structure, spatial residuals within each conclique are shown to be independent and identically distributed as uniform variables. The information from a series of concliques can be then pooled into goodness of fit statistics. Under some conditions, large sample distributions of these statistics are explicitly derived for testing both simple and composite hypotheses, where the latter involves additional parametric estimation steps. The distributional results are verified through simulation, and a data example illustrates the method for model assessment.
△ Less
Submitted 28 May, 2012;
originally announced May 2012.
-
Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation
Authors:
S. Mukhopadhyay,
Emanuel Parzen,
S. N. Lahiri
Abstract:
This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not i…
▽ More
This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not important for classification, then it will have similar distributional aspect under different classes. This simple and straightforward observation motivates us to quantify `How and Why' the distribution of a variable changes over classes through CR-statistic. The second contribution of our paper is to develop and investigate the FDR based thresholding technology from a completely new point of view for adaptive thresholding, which leads to a elegant algorithm called CDfdr. This paper attempts to show how all of these problems of detection, extraction and interpretation for interesting variables can be treated in a unified way under one broad general theme - comparison analysis. It is proposed that a key to accomplishing this unification is to think in terms of the quantile function and the comparison density. We illustrate and demonstrate the power of our methodology using three real data sets.
△ Less
Submitted 14 December, 2011;
originally announced December 2011.
-
Edgeworth expansions for studentized statistics under weak dependence
Authors:
S. N. Lahiri
Abstract:
In this paper, we derive valid Edgeworth expansions for studentized versions of a large class of statistics when the data are generated by a strongly mixing process. Under dependence, the asymptotic variance of such a statistic is given by an infinite series of lag-covariances, and therefore, studentizing factors (i.e., estimators of the asymptotic standard error) typically involve an increasing…
▽ More
In this paper, we derive valid Edgeworth expansions for studentized versions of a large class of statistics when the data are generated by a strongly mixing process. Under dependence, the asymptotic variance of such a statistic is given by an infinite series of lag-covariances, and therefore, studentizing factors (i.e., estimators of the asymptotic standard error) typically involve an increasing number, say, $\ell$ of lag-covariance estimators, which are themselves quadratic functions of the observations. The unboundedness of the dimension $\ell$ of these quadratic functions makes the derivation and the form of the expansions nonstandard. It is shown that in contrast to the case of the studentized means under independence, the derived Edgeworth expansion is a superposition of three distinct series, respectively, given by one in powers of $n^{-1/2}$, one in powers of $[n/\ell]^{-1/2}$ (resulting from the standard error of the studentizing factor) and one in powers of the bias of the studentizing factor, where $n$ denotes the sample size.
△ Less
Submitted 12 January, 2010;
originally announced January 2010.
-
A Berry--Esseen theorem for sample quantiles under weak dependence
Authors:
S. N. Lahiri,
S. Sun
Abstract:
This paper proves a Berry--Esseen theorem for sample quantiles of strongly-mixing random variables under a polynomial mixing rate. The rate of normal approximation is shown to be $O(n^{-1/2})$ as $n\to\infty$, where $n$ denotes the sample size. This result is in sharp contrast to the case of the sample mean of strongly-mixing random variables where the rate $O(n^{-1/2})$ is not known even under…
▽ More
This paper proves a Berry--Esseen theorem for sample quantiles of strongly-mixing random variables under a polynomial mixing rate. The rate of normal approximation is shown to be $O(n^{-1/2})$ as $n\to\infty$, where $n$ denotes the sample size. This result is in sharp contrast to the case of the sample mean of strongly-mixing random variables where the rate $O(n^{-1/2})$ is not known even under an exponential strong mixing rate. The main result of the paper has applications in finance and econometrics as financial time series data often are heavy-tailed and quantile based methods play an important role in various problems in finance, including hedging and risk management.
△ Less
Submitted 27 February, 2009;
originally announced February 2009.
-
Estimation of distributions, moments and quantiles in deconvolution problems
Authors:
Peter Hall,
Soumendra N. Lahiri
Abstract:
When using the bootstrap in the presence of measurement error, we must first estimate the target distribution function; we cannot directly resample, since we do not have a sample from the target. These and other considerations motivate the development of estimators of distributions, and of related quantities such as moments and quantiles, in errors-in-variables settings. We show that such estima…
▽ More
When using the bootstrap in the presence of measurement error, we must first estimate the target distribution function; we cannot directly resample, since we do not have a sample from the target. These and other considerations motivate the development of estimators of distributions, and of related quantities such as moments and quantiles, in errors-in-variables settings. We show that such estimators have curious and unexpected properties. For example, if the distributions of the variable of interest, $W$, say, and of the observation error are both centered at zero, then the rate of convergence of an estimator of the distribution function of $W$ can be slower at the origin than away from the origin. This is an intrinsic characteristic of the problem, not a quirk of particular estimators; the property holds true for optimal estimators.
△ Less
Submitted 27 October, 2008;
originally announced October 2008.
-
A frequency domain empirical likelihood for short- and long-range dependence
Authors:
Daniel J. Nordman,
Soumendra N. Lahiri
Abstract:
This paper introduces a version of empirical likelihood based on the periodogram and spectral estimating equations. This formulation handles dependent data through a data transformation (i.e., a Fourier transform) and is developed in terms of the spectral distribution rather than a time domain probability distribution. The asymptotic properties of frequency domain empirical likelihood are studie…
▽ More
This paper introduces a version of empirical likelihood based on the periodogram and spectral estimating equations. This formulation handles dependent data through a data transformation (i.e., a Fourier transform) and is developed in terms of the spectral distribution rather than a time domain probability distribution. The asymptotic properties of frequency domain empirical likelihood are studied for linear time processes exhibiting both short- and long-range dependence. The method results in likelihood ratios which can be used to build nonparametric, asymptotically correct confidence regions for a class of normalized (or ratio) spectral parameters, including autocorrelations. Maximum empirical likelihood estimators are possible, as well as tests of spectral moment conditions. The methodology can be applied to several inference problems such as Whittle estimation and goodness-of-fit testing.
△ Less
Submitted 1 August, 2007;
originally announced August 2007.
-
Resampling methods for spatial regression models under a class of stochastic designs
Authors:
S. N. Lahiri,
Jun Zhu
Abstract:
In this paper we consider the problem of bootstrapping a class of spatial regression models when the sampling sites are generated by a (possibly nonuniform) stochastic design and are irregularly spaced. It is shown that the natural extension of the existing block bootstrap methods for grid spatial data does not work for irregularly spaced spatial data under nonuniform stochastic designs. A varia…
▽ More
In this paper we consider the problem of bootstrapping a class of spatial regression models when the sampling sites are generated by a (possibly nonuniform) stochastic design and are irregularly spaced. It is shown that the natural extension of the existing block bootstrap methods for grid spatial data does not work for irregularly spaced spatial data under nonuniform stochastic designs. A variant of the blocking mechanism is proposed. It is shown that the proposed block bootstrap method provides a valid approximation to the distribution of a class of M-estimators of the spatial regression parameters. Finite sample properties of the method are investigated through a moderately large simulation study and a real data example is given to illustrate the methodology.
△ Less
Submitted 9 November, 2006;
originally announced November 2006.
-
Asymptotic expansions for sums of block-variables under weak dependence
Authors:
S. N. Lahiri
Abstract:
Let $\{X_i\}_{i=-\infty}^{\infty}$ be a sequence of random vectors and $Y_{in}=f_{in}(\mathcal{X}_{i,\ell})$ be zero mean block-variables where $\mathcal{X}_{i,\ell}=(X_i,...,X_{i+\ell-1}),i\geq 1$, are overlapping blocks of length $\ell$ and where $f_{in}$ are Borel measurable functions. This paper establishes valid joint asymptotic expansions of general orders for the joint distribution of the…
▽ More
Let $\{X_i\}_{i=-\infty}^{\infty}$ be a sequence of random vectors and $Y_{in}=f_{in}(\mathcal{X}_{i,\ell})$ be zero mean block-variables where $\mathcal{X}_{i,\ell}=(X_i,...,X_{i+\ell-1}),i\geq 1$, are overlapping blocks of length $\ell$ and where $f_{in}$ are Borel measurable functions. This paper establishes valid joint asymptotic expansions of general orders for the joint distribution of the sums $\sum_{i=1}^nX_i$ and $\sum_{i=1}^nY_{in}$ under weak dependence conditions on the sequence $\{X_i\}_{i=-\infty}^{\infty}$ when the block length $\ell$ grows to infinity. In contrast to the classical Edgeworth expansion results where the terms in the expansions are given by powers of $n^{-1/2}$, the expansions derived here are mixtures of two series, one in powers of $n^{-1/2}$ and the other in powers of $[\frac{n}{\ell}]^{-1/2}$. Applications of the main results to (i) expansions for Studentized statistics of time series data and (ii) second order correctness of the blocks of blocks bootstrap method are given.
△ Less
Submitted 17 August, 2007; v1 submitted 28 June, 2006;
originally announced June 2006.
-
Resampling Based Empirical Prediction: An Application to Small Area Estimation
Authors:
Soumendra N. Lahiri,
Tapabrata Maiti,
Myron Katzoff,
Van Parsons
Abstract:
Best linear unbiased prediction is well known for its wide range of applications including small area estimation. While the theory is well established for mixed linear models and under normality of the error and mixing distributions, the literature is sparse for nonlinear mixed models under nonnormality of the error or of the mixing distributions. This article develops a resampling based unified…
▽ More
Best linear unbiased prediction is well known for its wide range of applications including small area estimation. While the theory is well established for mixed linear models and under normality of the error and mixing distributions, the literature is sparse for nonlinear mixed models under nonnormality of the error or of the mixing distributions. This article develops a resampling based unified approach for predicting mixed effects under a generalized mixed model set up. Second order accurate nonnegative estimators of mean squared prediction errors are also developed. Given the parametric model, the proposed methodology automatically produces estimates of the small area parameters and their MSPEs, without requiring explicit analytical expressions for the MSPE.
△ Less
Submitted 9 July, 2006; v1 submitted 24 April, 2006;
originally announced April 2006.
-
Nonnegative mean squared prediction error estimation in small area estimation
Authors:
Soumendra N. Lahiri,
Tapabrata Maiti
Abstract:
Small area estimation has received enormous attention in recent years due to its wide range of application, particularly in policy making decisions. The variance based on direct sample size of small area estimator is unduly large and there is a need of constructing model based estimator with low mean squared prediction error (MSPE). Estimation of MSPE and in particular the bias correction of MSP…
▽ More
Small area estimation has received enormous attention in recent years due to its wide range of application, particularly in policy making decisions. The variance based on direct sample size of small area estimator is unduly large and there is a need of constructing model based estimator with low mean squared prediction error (MSPE). Estimation of MSPE and in particular the bias correction of MSPE plays the central piece of small area estimation research. In this article, a new technique of bias correction for the estimated MSPE is proposed. It is shown that that the new MSPE estimator attains the same level of bias correction as the existing estimators based on straight Taylor expansion and jackknife methods. However, unlike the existing methods, the proposed estimate of MSPE is always nonnegative. Furthermore, the proposed method can be used for general two-level small area models where the variables at each level can be discrete or continuous and, in particular, be nonnormal.
△ Less
Submitted 4 April, 2006;
originally announced April 2006.
-
Consistency of the jackknife-after-bootstrap variance estimator for the bootstrap quantiles of a studentized statistic
Authors:
S. N. Lahiri
Abstract:
Efron [J. Roy. Statist. Soc. Ser. B 54 (1992) 83--111] proposed a computationally efficient method, called the jackknife-after-bootstrap, for estimating the variance of a bootstrap estimator for independent data. For dependent data, a version of the jackknife-after-bootstrap method has been recently proposed by Lahiri [Econometric Theory 18 (2002) 79--98]. In this paper it is shown that the jack…
▽ More
Efron [J. Roy. Statist. Soc. Ser. B 54 (1992) 83--111] proposed a computationally efficient method, called the jackknife-after-bootstrap, for estimating the variance of a bootstrap estimator for independent data. For dependent data, a version of the jackknife-after-bootstrap method has been recently proposed by Lahiri [Econometric Theory 18 (2002) 79--98]. In this paper it is shown that the jackknife-after-bootstrap estimators of the variance of a bootstrap quantile are consistent for both dependent and independent data. Results from a simulation study are also presented.
△ Less
Submitted 15 February, 2006;
originally announced February 2006.
-
A Sub-Gaussian Berry-Esseen Theorem for the Hypergeometric Distribution
Authors:
Soumendra N. Lahiri,
A. Chatterjee,
T. Maiti
Abstract:
In this paper, we derive a necessary and sufficient condition on the parameters of the Hypergeometric distribution for weak convergence to a Normal limit. We establish a Berry-Esseen theorem for the Hypergeometric distribution solely under this necessary and sufficient condition. We further derive a nonuniform Berry-Esseen bound where the tails of the difference between the Hypergeometric and th…
▽ More
In this paper, we derive a necessary and sufficient condition on the parameters of the Hypergeometric distribution for weak convergence to a Normal limit. We establish a Berry-Esseen theorem for the Hypergeometric distribution solely under this necessary and sufficient condition. We further derive a nonuniform Berry-Esseen bound where the tails of the difference between the Hypergeometric and the Normal distribution functions are shown to decay at a sub-Gaussian rate.
△ Less
Submitted 13 February, 2006;
originally announced February 2006.
-
On optimal spatial subsample size for variance estimation
Authors:
Daniel J. Nordman,
Soumendra N. Lahiri
Abstract:
We consider the problem of determining the optimal block (or subsample) size for a spatial subsampling method for spatial processes observed on regular grids. We derive expansions for the mean square error of the subsampling variance estimator, which yields an expression for the theoretically optimal block size. The optimal block size is shown to depend in an intricate way on the geometry of the…
▽ More
We consider the problem of determining the optimal block (or subsample) size for a spatial subsampling method for spatial processes observed on regular grids. We derive expansions for the mean square error of the subsampling variance estimator, which yields an expression for the theoretically optimal block size. The optimal block size is shown to depend in an intricate way on the geometry of the spatial sampling region as well as characteristics of the underlying random field. Final expressions for the optimal block size make use of some nontrivial estimates of lattice point counts in shifts of convex sets. Optimal block sizes are computed for sampling regions of a number of commonly encountered shapes. Numerical studies are performed to compare subsampling methods as well as procedures for estimating the theoretically best block size.
△ Less
Submitted 29 March, 2005;
originally announced March 2005.