Search | arXiv e-print repository

Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances

Authors: Sloan Nietert, Ritwik Sadhu, Ziv Goldfeld, Kengo Kato

Abstract: Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates… ▽ More Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates with explicit dependence of constants on dimension, subject to log-concavity of the population distributions. For robustness, we characterize minimax optimal, dimension-free robust estimation risks, and show an equivalence between robust sliced 1-Wasserstein estimation and robust mean estimation. This enables lifting statistical and algorithmic guarantees available for the latter to the sliced 1-Wasserstein setting. Moving on to computational aspects, we analyze the Monte Carlo estimator for the average-sliced distance, demonstrating that larger dimension can result in faster convergence of the numerical integration error. For the max-sliced distance, we focus on a subgradient-based local optimization algorithm that is frequently used in practice, albeit without formal guarantees, and establish an $O(ε^{-4})$ computational complexity bound for it. Our theory is validated by numerical experiments, which altogether provide a comprehensive quantitative account of the scalability question. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2107.13494 [pdf, ps, other]

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Authors: Ritwik Sadhu, Ziv Goldfeld, Kengo Kato

Abstract: The smooth 1-Wasserstein distance (SWD) $W_1^σ$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study… ▽ More The smooth 1-Wasserstein distance (SWD) $W_1^σ$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result for empirical $W_1^σ$, bootstrap consistency, concentration inequalities, and Berry-Esseen type bounds. The derived nondegenerate limit stands in sharp contrast with the classic empirical $W_1$, for which a similar result is known only in the one-dimensional case. We also explore asymptotics and characterize the limit distribution when the smoothing parameter $σ$ is scaled with $n$, converging to $0$ at a sufficiently slow rate. The dimensionality of the sampled distribution enters empirical SWD convergence bounds only through the prefactor (i.e., the constant). We provide a sharp characterization of this prefactor's dependence on the smoothing parameter and the intrinsic dimension. This result is then used to derive new empirical convergence rates for classic $W_1$ in terms of the intrinsic dimension. As applications of the limit distribution theory, we study two-sample testing and minimum distance estimation (MDE) under $W_1^σ$. We establish asymptotic validity of SWD testing, while for MDE, we prove measurability, almost sure convergence, and limit distributions for optimal estimators and their corresponding $W_1^σ$ error. Our results suggest that the SWD is well suited for high-dimensional statistical learning and inference. △ Less

Submitted 24 February, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

MSC Class: 62E17; 60F05; 60F17; 62G10; 62F12; 62F40

arXiv:2102.06586 [pdf, ps, other]

Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

Authors: Harold D. Chiang, Kengo Kato, Yuya Sasaki, Takuya Ura

Abstract: We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence inte… ▽ More We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2101.04039 [pdf, other]

Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical Applications

Authors: Sloan Nietert, Ziv Goldfeld, Kengo Kato

Abstract: Discrepancy measures between probability distributions, often termed statistical distances, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalabili… ▽ More Discrepancy measures between probability distributions, often termed statistical distances, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of this framework to high dimensions, we investigate the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(σ)}$, for arbitrary $p\geq 1$. After establishing basic metric and topological properties of $\mathsf{W}_p^{(σ)}$, we explore the asymptotic statistical behavior of $\mathsf{W}_p^{(σ)}(\hatμ_n,μ)$, where $\hatμ_n$ is the empirical distribution of $n$ independent observations from $μ$. We prove that $\mathsf{W}_p^{(σ)}$ enjoys a parametric empirical convergence rate of $n^{-1/2}$, which contrasts the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. Our proof relies on controlling $\mathsf{W}_p^{(σ)}$ by a $p$th-order smooth Sobolev distance $\mathsf{d}_p^{(σ)}$ and deriving the limit distribution of $\sqrt{n}\,\mathsf{d}_p^{(σ)}(\hatμ_n,μ)$, for all dimensions $d$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation using $\mathsf{W}_p^{(σ)}$, with experiments for $p=2$ using a maximum mean discrepancy formulation of $\mathsf{d}_2^{(σ)}$. △ Less

Submitted 17 December, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: updated to match ICML 2021 paper

arXiv:2007.15190 [pdf, other]

Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding

Authors: Akira Nakagawa, Keizo Kato, Taiji Suzuki

Abstract: Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property through the differential geometric and information-theoretic interpretations of VAE. According to the R… ▽ More Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property through the differential geometric and information-theoretic interpretations of VAE. According to the Rate-distortion theory, the optimal transform coding is achieved by using an orthonormal transform with PCA basis where the transform space is isometric to the input. Considering the analogy of transform coding to VAE, we clarify theoretically and experimentally that VAE can be mapped to an implicit isometric embedding with a scale factor derived from the posterior parameter. As a result, we can estimate the data probabilities in the input space from the prior, loss metrics, and corresponding posterior parameters, and further, the quantitative importance of each latent variable can be evaluated like the eigenvalue of PCA. △ Less

Submitted 22 February, 2023; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: Accepted to the International Conference on Machine Learning (ICML) 2021. 40 pages, 29 figures

ACM Class: I.2.4

arXiv:2006.00952 [pdf, other]

Bootstrap inference for quantile-based modal regression

Authors: Tao Zhang, Kengo Kato, David Ruppert

Abstract: In this paper, we develop uniform inference methods for the conditional mode based on quantile regression. Specifically, we propose to estimate the conditional mode by minimizing the derivative of the estimated conditional quantile function defined by smoothing the linear quantile regression estimator, and develop two bootstrap methods, a novel pivotal bootstrap and the nonparametric bootstrap, fo… ▽ More In this paper, we develop uniform inference methods for the conditional mode based on quantile regression. Specifically, we propose to estimate the conditional mode by minimizing the derivative of the estimated conditional quantile function defined by smoothing the linear quantile regression estimator, and develop two bootstrap methods, a novel pivotal bootstrap and the nonparametric bootstrap, for our conditional mode estimator. Building on high-dimensional Gaussian approximation techniques, we establish the validity of simultaneous confidence rectangles constructed from the two bootstrap methods for the conditional mode. We also extend the preceding analysis to the case where the dimension of the covariate vector is increasing with the sample size. Finally, we conduct simulation experiments and a real data analysis using U.S. wage data to demonstrate the finite sample performance of our inference method. △ Less

Submitted 12 April, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 78 pages

arXiv:1910.04329 [pdf, other]

Rate-Distortion Optimization Guided Autoencoder for Isometric Embedding in Euclidean Latent Space

Authors: Keizo Kato, Jing Zhou, Tomotake Sasaki, Akira Nakagawa

Abstract: To analyze high-dimensional and complex data in the real world, deep generative models, such as variational autoencoder (VAE) embed data in a low-dimensional space (latent space) and learn a probabilistic model in the latent space. However, they struggle to accurately reproduce the probability distribution function (PDF) in the input space from that in the latent space. If the embedding were isome… ▽ More To analyze high-dimensional and complex data in the real world, deep generative models, such as variational autoencoder (VAE) embed data in a low-dimensional space (latent space) and learn a probabilistic model in the latent space. However, they struggle to accurately reproduce the probability distribution function (PDF) in the input space from that in the latent space. If the embedding were isometric, this issue can be solved, because the relation of PDFs can become tractable. To achieve isometric property, we propose Rate- Distortion Optimization guided autoencoder inspired by orthonormal transform coding. We show our method has the following properties: (i) the Jacobian matrix between the input space and a Euclidean latent space forms a constantlyscaled orthonormal system and enables isometric data embedding; (ii) the relation of PDFs in both spaces can become tractable one such as proportional relation. Furthermore, our method outperforms state-of-the-art methods in unsupervised anomaly detection with four public datasets. △ Less

Submitted 30 August, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted to the International Conference on Machine Learning (ICML) 2020

MSC Class: 68T01

arXiv:1908.03152 [pdf, other]

Analysis of Networks via the Sparse $β$-Model

Authors: Mingli Chen, Kengo Kato, Chenlei Leng

Abstract: Data in the form of networks are increasingly available in a variety of areas, yet statistical models allowing for parameter estimates with desirable statistical properties for sparse networks remain scarce. To address this, we propose the Sparse $β$-Model (S$β$M), a new network model that interpolates the celebrated Erdős-Rényi model and the $β$-model that assigns one different parameter to each… ▽ More Data in the form of networks are increasingly available in a variety of areas, yet statistical models allowing for parameter estimates with desirable statistical properties for sparse networks remain scarce. To address this, we propose the Sparse $β$-Model (S$β$M), a new network model that interpolates the celebrated Erdős-Rényi model and the $β$-model that assigns one different parameter to each node. By a novel reparameterization of the $β$-model to distinguish global and local parameters, our S$β$M can drastically reduce the dimensionality of the $β$-model by requiring some of the local parameters to be zero. We derive the asymptotic distribution of the maximum likelihood estimator of the S$β$M when the support of the parameter vector is known. When the support is unknown, we formulate a penalized likelihood approach with the $\ell_0$-penalty. Remarkably, we show via a monotonicity lemma that the seemingly combinatorial computational problem due to the $\ell_0$-penalty can be overcome by assigning nonzero parameters to those nodes with the largest degrees. We further show that a $β$-min condition guarantees our method to identify the true model and provide excess risk bounds for the estimated parameters. The estimation procedure enjoys good finite sample properties as shown by simulation studies. The usefulness of the S$β$M is further illustrated via the analysis of a microfinance take-up example. △ Less

Submitted 17 December, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

Comments: 36 pages

arXiv:1901.01163 [pdf, ps, other]

doi 10.1214/19-EJS1643

Approximating high-dimensional infinite-order $U$-statistics: statistical and computational guarantees

Authors: Yanglei Song, Xiaohui Chen, Kengo Kato

Abstract: We study the problem of distributional approximations to high-dimensional non-degenerate $U$-statistics with random kernels of diverging orders. Infinite-order $U$-statistics (IOUS) are a useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of ensemble methods such as subbagging and random forests. A major obstacle in using the IOUS is their computational in… ▽ More We study the problem of distributional approximations to high-dimensional non-degenerate $U$-statistics with random kernels of diverging orders. Infinite-order $U$-statistics (IOUS) are a useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of ensemble methods such as subbagging and random forests. A major obstacle in using the IOUS is their computational intractability when the sample size and/or order are large. In this article, we derive non-asymptotic Gaussian approximation error bounds for an incomplete version of the IOUS with a random kernel. We also study data-driven inferential methods for the incomplete IOUS via bootstraps and develop their statistical and computational guarantees. △ Less

Submitted 15 November, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

Journal ref: Electronic Journal of Statistics 2019, Vol. 13, No. 2, 4794-4848

arXiv:1811.05379 [pdf, other]

Quantile regression approach to conditional mode estimation

Authors: Hirofumi Ota, Kengo Kato, Satoshi Hara

Abstract: In this paper, we consider estimation of the conditional mode of an outcome variable given regressors. To this end, we propose and analyze a computationally scalable estimator derived from a linear quantile regression model and develop asymptotic distributional theory for the estimator. Specifically, we find that the pointwise limiting distribution is a scale transformation of Chernoff's distribut… ▽ More In this paper, we consider estimation of the conditional mode of an outcome variable given regressors. To this end, we propose and analyze a computationally scalable estimator derived from a linear quantile regression model and develop asymptotic distributional theory for the estimator. Specifically, we find that the pointwise limiting distribution is a scale transformation of Chernoff's distribution despite the presence of regressors. In addition, we consider analytical and subsampling-based confidence intervals for the proposed estimator. We also conduct Monte Carlo simulations to assess the finite sample performance of the proposed estimator together with the analytical and subsampling confidence intervals. Finally, we apply the proposed estimator to predicting the net hourly electrical energy output using Combined Cycle Power Plant Data. △ Less

Submitted 29 July, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

Comments: This paper supersedes "On estimation of conditional modes using multiple quantile regressions" (Hirofumi Ohta and Satoshi Hara, arXiv:1712.08754)

arXiv:1712.00771 [pdf, other]

Randomized incomplete $U$-statistics in high dimensions

Authors: Xiaohui Chen, Kengo Kato

Abstract: This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of Big Data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computational… ▽ More This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of Big Data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both non-degenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature. △ Less

Submitted 27 January, 2019; v1 submitted 3 December, 2017; originally announced December 2017.

MSC Class: 62E17; 62F40; 62H15

arXiv:1708.02705 [pdf, other]

Jackknife multiplier bootstrap: finite sample approximations to the $U$-process supremum with applications

Authors: Xiaohui Chen, Kengo Kato

Abstract: This paper is concerned with finite sample approximations to the supremum of a non-degenerate $U$-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the $U$-process itself is not weakly convergent as a process. Such situations arise in a variety of modern… ▽ More This paper is concerned with finite sample approximations to the supremum of a non-degenerate $U$-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the $U$-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the $U$-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the $U$-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the $U$-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for $U$-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local $U$-processes. △ Less

Submitted 13 February, 2019; v1 submitted 8 August, 2017; originally announced August 2017.

MSC Class: 60F17; 62E17; 62F40; 62G10

arXiv:1312.7614 [pdf, ps, other]

Inference on causal and structural parameters using many moment inequalities

Authors: Victor Chernozhukov, Denis Chetverikov, Kengo Kato

Abstract: This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where… ▽ More This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where $p=2^{m+1}$ with $m$ being the number of firms that could possibly enter the market. We consider the test statistic given by the maximum of $p$ Studentized (or $t$-type) inequality-specific statistics, and analyze various ways to compute critical values for the test statistic. Specifically, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier and empirical bootstraps, and (iii) two-step and three-step variants of (i) and (ii) by incorporating the selection of uninformative inequalities that are far from being binding and a novel selection of weakly informative inequalities that are potentially binding but do not provide first order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with the error in size decreasing polynomially in $n$ while allowing for $p$ being much larger than $n$, indeed $p$ can be of order $\exp (n^{c})$ for some $c > 0$. Importantly, all these results hold without any restriction on the correlation structure between $p$ Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, in the online supplement, we show validity of a test based on the block multiplier bootstrap in the case of dependent data under some general mixing conditions. △ Less

Submitted 18 October, 2018; v1 submitted 29 December, 2013; originally announced December 2013.

Comments: This paper was previously circulated under the title "Testing many moment inequalities"

arXiv:1304.0282 [pdf, ps, other]

doi 10.1093/biomet/asu056

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems

Authors: Alexandre Belloni, Victor Chernozhukov, Kengo Kato

Abstract: We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against non-regular estimation of the nuisance part of the median regression function by using Neyman's orthogonalization. We establish that the resulting instrumental median regression… ▽ More We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against non-regular estimation of the nuisance part of the median regression function by using Neyman's orthogonalization. We establish that the resulting instrumental median regression estimator of a target regression coefficient is asymptotically normally distributed uniformly with respect to the underlying sparse model and is semi-parametrically efficient. We also generalize our method to a general non-smooth Z-estimation framework with the number of target parameters $p_1$ being possibly much larger than the sample size $n$. We extend Huber's results on asymptotic normality to this setting, demonstrating uniform asymptotic normality of the proposed estimators over $p_1$-dimensional rectangles, constructing simultaneous confidence bands on all of the $p_1$ target parameters, and establishing asymptotic validity of the bands uniformly over underlying approximately sparse models. Keywords: Instrument; Post-selection inference; Sparsity; Neyman's Orthogonal Score test; Uniformly valid inference; Z-estimation. △ Less

Submitted 18 October, 2020; v1 submitted 31 March, 2013; originally announced April 2013.

Comments: includes supplementary material; 2 figures

MSC Class: 62F03; 62F12; 62F40

arXiv:1212.0442 [pdf, ps, other]

Some New Asymptotic Theory for Least Squares Series: Pointwise and Uniform Results

Authors: Alexandre Belloni, Victor Chernozhukov, Denis Chetverikov, Kengo Kato

Abstract: In applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements. Series method offers that by approximating the unknown function based on $k$ basis functions, where $k$ is allowed to grow with the sample size $n$. We consider series estimators for the conditional mean in light of: (i) sharp LLNs for matrices der… ▽ More In applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements. Series method offers that by approximating the unknown function based on $k$ basis functions, where $k$ is allowed to grow with the sample size $n$. We consider series estimators for the conditional mean in light of: (i) sharp LLNs for matrices derived from the noncommutative Khinchin inequalities, (ii) bounds on the Lebesgue factor that controls the ratio between the $L^\infty$ and $L_2$-norms of approximation errors, (iii) maximal inequalities for processes whose entropy integrals diverge, and (iv) strong approximations to series-type processes. These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken the condition on the number $k$ of approximating functions used in series estimation from the typical $k^2/n \to 0$ to $k/n \to 0$, up to log factors, which was available only for spline series before. Second, we derive $L_2$ rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function. We derive uniform rates, Gaussian approximations, and uniform confidence bands for a wide collection of linear functionals of the conditional expectation function. △ Less

Submitted 17 June, 2015; v1 submitted 3 December, 2012; originally announced December 2012.

Journal ref: Journal of Econometrics 186 (2015) 345-366

arXiv:1207.5313 [pdf, ps, other]

Two-step estimation of high dimensional additive models

Authors: Kengo Kato

Abstract: This paper investigates the two-step estimation of a high dimensional additive regression model, in which the number of nonparametric additive components is potentially larger than the sample size but the number of significant additive components is sufficiently small. The approach investigated consists of two steps. The first step implements the variable selection, typically by the group Lasso, a… ▽ More This paper investigates the two-step estimation of a high dimensional additive regression model, in which the number of nonparametric additive components is potentially larger than the sample size but the number of significant additive components is sufficiently small. The approach investigated consists of two steps. The first step implements the variable selection, typically by the group Lasso, and the second step applies the penalized least squares estimation with Sobolev penalties to the selected additive components. Such a procedure is computationally simple to implement and, in our numerical experiments, works reasonably well. Despite its intuitive nature, the theoretical properties of this two-step procedure have to be carefully analyzed, since the effect of the first step variable selection is random, and generally it may contain redundant additive components and at the same time miss significant additive components. This paper derives a generic performance bound on the two-step estimation procedure allowing for these situations, and studies in detail the overall performance when the first step variable selection is implemented by the group Lasso. △ Less

Submitted 29 January, 2013; v1 submitted 23 July, 2012; originally announced July 2012.

Comments: 49 pages, 3 tables; minor errors corrected

arXiv:1204.2108 [pdf, ps, other]

doi 10.1214/13-AOS1150

Quasi-Bayesian analysis of nonparametric instrumental variables models

Authors: Kengo Kato

Abstract: This paper aims at developing a quasi-Bayesian analysis of the nonparametric instrumental variables model, with a focus on the asymptotic properties of quasi-posterior distributions. In this paper, instead of assuming a distributional assumption on the data generating process, we consider a quasi-likelihood induced from the conditional moment restriction, and put priors on the function-valued para… ▽ More This paper aims at developing a quasi-Bayesian analysis of the nonparametric instrumental variables model, with a focus on the asymptotic properties of quasi-posterior distributions. In this paper, instead of assuming a distributional assumption on the data generating process, we consider a quasi-likelihood induced from the conditional moment restriction, and put priors on the function-valued parameter. We call the resulting posterior quasi-posterior, which corresponds to ``Gibbs posterior'' in the literature. Here we focus on priors constructed on slowly growing finite-dimensional sieves. We derive rates of contraction and a nonparametric Bernstein-von Mises type result for the quasi-posterior distribution, and rates of convergence for the quasi-Bayes estimator defined by the posterior expectation. We show that, with priors suitably chosen, the quasi-posterior distribution (the quasi-Bayes estimator) attains the minimax optimal rate of contraction (convergence, resp.). These results greatly sharpen the previous related work. △ Less

Submitted 20 November, 2013; v1 submitted 10 April, 2012; originally announced April 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1150 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1150

Journal ref: Annals of Statistics 2013, Vol. 41, No. 5, 2359-2390

arXiv:1202.4850 [pdf, ps, other]

doi 10.1214/12-AOS1066

Estimation in functional linear quantile regression

Authors: Kengo Kato

Abstract: This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per… ▽ More This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per subject increases as the sample size. Also, we allow the quantile index to vary over a given subset of the open unit interval, so the slope function is a function of two variables: (typically) time and quantile index. Likewise, the conditional quantile function is a function of the quantile index and the covariate. We consider an estimator for the slope function based on the principal component basis. An estimator for the conditional quantile function is obtained by a plug-in method. Since the so-constructed plug-in estimator not necessarily satisfies the monotonicity constraint with respect to the quantile index, we also consider a class of monotonized estimators for the conditional quantile function. We establish rates of convergence for these estimators under suitable norms, showing that these rates are optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. Empirical choice of the cutoff level is studied by using simulations. △ Less

Submitted 27 February, 2013; v1 submitted 22 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1066 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1066

Journal ref: Annals of Statistics 2012, Vol. 40, No. 6, 3108-3136

arXiv:1103.1458 [pdf, ps, other]

Group Lasso for high dimensional sparse quantile regression models

Authors: Kengo Kato

Abstract: This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\ell_{2}$-estima… ▽ More This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of suitable regularity conditions, the group Lasso estimator can attain the convergence rate arbitrarily close to the oracle rate. Finally, we conduct simulations experiments to examine our theoretical results. △ Less

Submitted 25 March, 2011; v1 submitted 8 March, 2011; originally announced March 2011.

Comments: 37 pages. Some errors are corrected

MSC Class: 62G05; 62J99

Showing 1–19 of 19 results for author: Kato, K