-
General Frameworks for Conditional Two-Sample Testing
Authors:
Seongchan Lee,
Suman Cha,
Ilmun Kim
Abstract:
We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as domain adaptation and algorithmic fairness, where comparing two groups is essential while controlling for confounding variables. We begin by establishing a hardness…
▽ More
We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as domain adaptation and algorithmic fairness, where comparing two groups is essential while controlling for confounding variables. We begin by establishing a hardness result for conditional two-sample testing, demonstrating that no valid test can have significant power against any single alternative without proper assumptions. We then introduce two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power. Our first framework allows us to convert any conditional independence test into a conditional two-sample test in a black-box manner, while preserving the asymptotic properties of the original conditional independence test. The second framework transforms the problem into comparing marginal distributions with estimated density ratios, which allows us to leverage existing methods for marginal two-sample testing. We demonstrate this idea in a concrete manner with classification and kernel-based methods. Finally, simulation studies are conducted to illustrate the proposed frameworks in finite-sample scenarios.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions
Authors:
Sungwoo Park,
Junyeop Kwon,
Byeongnoh Kim,
Suhyun Chae,
Jeeyong Lee,
Dabeen Lee
Abstract:
This paper studies performative risk minimization, a formulation of stochastic optimization under decision-dependent distributions. We consider the general case where the performative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lipschitz bandit-based method in many aspects. In pa…
▽ More
This paper studies performative risk minimization, a formulation of stochastic optimization under decision-dependent distributions. We consider the general case where the performative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lipschitz bandit-based method in many aspects. In particular, our framework does not require knowledge about the sensitivity parameter of the distribution map and the Lipshitz constant of the loss function. This makes our framework practically favorable, together with the efficient optimistic optimization-based tree-search mechanism. We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Distributed Bootstrap for Simultaneous Inference Under High Dimensionality
Authors:
Yang Yu,
Shih-Kang Chao,
Guang Cheng
Abstract:
We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a communication-efficient de-biased lasso, and we propose an efficient cross-validation approach to tune the method at every iteration. We theoretically prove a lower bound on the…
▽ More
We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a communication-efficient de-biased lasso, and we propose an efficient cross-validation approach to tune the method at every iteration. We theoretically prove a lower bound on the number of communication rounds $τ_{\min}$ that warrants the statistical accuracy and efficiency. Furthermore, $τ_{\min}$ only increases logarithmically with the number of workers and the intrinsic dimensionality, while nearly invariant to the nominal dimensionality. We test our theory by extensive simulation studies, and a variable screening task on a semi-synthetic dataset based on the US Airline On-Time Performance dataset. The code to reproduce the numerical results is available at GitHub: https://github.com/skchao74/Distributed-bootstrap.
△ Less
Submitted 14 June, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Toric heaps, cyclic reducibility, and conjugacy in Coxeter groups
Authors:
Shih-Wei Chao,
Matthew Macauley
Abstract:
As a visualization of Cartier and Foata's "partially commutative monoid" theory, G.X. Viennot introduced "heaps of pieces" in 1986. These are essentially labeled posets satisfying a few additional properties. They naturally arise as models of reduced words in Coxeter groups. In this paper, we introduce a cyclic version, motivated by the idea of taking a heap and wrapping it into a cylinder. We cal…
▽ More
As a visualization of Cartier and Foata's "partially commutative monoid" theory, G.X. Viennot introduced "heaps of pieces" in 1986. These are essentially labeled posets satisfying a few additional properties. They naturally arise as models of reduced words in Coxeter groups. In this paper, we introduce a cyclic version, motivated by the idea of taking a heap and wrapping it into a cylinder. We call this object a "toric heap", as we formalize it as a labeled toric poset, which is a cyclic version of an ordinary poset. To define the concept of a toric extension, we develop a morphism in the category of toric heaps. We study toric heaps in Coxeter theory, in view of the fact that a cyclic shift of a reduced word is simply a conjugate by an initial or terminal generator. This allows us to formalize and study a framework of "cyclic reducibility" in Coxeter theory, and apply it to model conjugacy. We introduce the notion of "torically reduced", which is stronger than being cyclically reduced for group elements. This gives rise to a new class of elements called "torically fully commutative" (TFC), which are those that have a unique cyclic commutativity class, and comprise a strictly bigger class than the "cyclically fully commutative" (CFC) elements. We prove several cyclic analogues of results on fully commutative (FC) elements due to Stembridge. We conclude with how this framework fits into recent work in Coxeter groups, and we correct a minor flaw in a few recently published theorems.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Uncovering Hierarchical Structure in Social Networks using Isospectral Reductions
Authors:
Chi-Jen Wang,
Seokjoo Chae,
Leonid A. Bunimovich,
Benjamin Z. Webb
Abstract:
We employ the recently developed theory of isospectral network reductions to analyze multi-mode social networks. This procedure allows us to uncover the hierarchical structure of the networks we consider as well as the hierarchical structure of each mode of the network. Additionally, by performing a dynamical analysis of these networks we are able to analyze the evolution of their structure allowi…
▽ More
We employ the recently developed theory of isospectral network reductions to analyze multi-mode social networks. This procedure allows us to uncover the hierarchical structure of the networks we consider as well as the hierarchical structure of each mode of the network. Additionally, by performing a dynamical analysis of these networks we are able to analyze the evolution of their structure allowing us to find a number of other network features. We apply both of these approaches to the Southern Women Data Set, one of the most studied social networks and demonstrate that these techniques provide new information, which complements previous findings.
△ Less
Submitted 5 December, 2017;
originally announced January 2018.
-
Transformations of Asymptotically AdS Hyperbolic Initial Data and Associated Geometric Inequalities
Authors:
Ye Sle Cha,
Marcus A. Khuri
Abstract:
We construct transformations which take asymptotically AdS hyperbolic initial data into asymptotically flat initial data, and which preserve relevant physical quantities. This is used to derive geometric inequalities in the asymptotically AdS hyperbolic setting from counterparts in the asymptotically flat realm, whenever a geometrically motivated system of elliptic equations admits a solution. The…
▽ More
We construct transformations which take asymptotically AdS hyperbolic initial data into asymptotically flat initial data, and which preserve relevant physical quantities. This is used to derive geometric inequalities in the asymptotically AdS hyperbolic setting from counterparts in the asymptotically flat realm, whenever a geometrically motivated system of elliptic equations admits a solution. The inequalities treated here relate mass, angular momentum, charge, and horizon area.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
Distributed inference for quantile regression processes
Authors:
Stanislav Volgushev,
Shih-Kang Chao,
Guang Cheng
Abstract:
The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditi…
▽ More
The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from sub-samples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.
△ Less
Submitted 10 April, 2018; v1 submitted 21 January, 2017;
originally announced January 2017.
-
Quantile Processes for Semi and Nonparametric Regression
Authors:
Shih-Kang Chao,
Stanislav Volgushev,
Guang Cheng
Abstract:
A collection of quantile curves provides a complete picture of conditional distributions. Properly centered and scaled versions of estimated curves at various quantile levels give rise to the so-called quantile regression process (QRP). In this paper, we establish weak convergence of QRP in a general series approximation framework, which includes linear models with increasing dimension, nonparamet…
▽ More
A collection of quantile curves provides a complete picture of conditional distributions. Properly centered and scaled versions of estimated curves at various quantile levels give rise to the so-called quantile regression process (QRP). In this paper, we establish weak convergence of QRP in a general series approximation framework, which includes linear models with increasing dimension, nonparametric models and partial linear models. An interesting consequence is obtained in the last class of models, where parametric and non-parametric estimators are shown to be asymptotically independent. Applications of our general process convergence results include the construction of non-crossing quantile curves and the estimation of conditional distribution functions. As a result of independent interest, we obtain a series of Bahadur representations with exponential bounds for tail probabilities of all remainder terms. Bounds of this kind are potentially useful in analyzing statistical inference procedures under divide-and-conquer setup.
△ Less
Submitted 21 July, 2017; v1 submitted 7 April, 2016;
originally announced April 2016.
-
Reduction Arguments for Geometric Inequalities Associated With Asymptotically Hyperboloidal Slices
Authors:
Ye Sle Cha,
Marcus Khuri,
Anna Sakovich
Abstract:
We consider several geometric inequalities in general relativity involving mass, area, charge, and angular momentum for asymptotically hyperboloidal initial data. We show how to reduce each one to the known maximal (or time symmetric) case in the asymptotically flat setting, whenever a geometrically motivated system of elliptic equations admits a solution.
We consider several geometric inequalities in general relativity involving mass, area, charge, and angular momentum for asymptotically hyperboloidal initial data. We show how to reduce each one to the known maximal (or time symmetric) case in the asymptotically flat setting, whenever a geometrically motivated system of elliptic equations admits a solution.
△ Less
Submitted 13 January, 2016; v1 submitted 21 September, 2015;
originally announced September 2015.
-
Deformations of Charged Axially Symmetric Initial Data and the Mass-Angular Momentum-Charge Inequality
Authors:
Ye Sle Cha,
Marcus A. Khuri
Abstract:
We show how to reduce the general formulation of the mass-angular momentum-charge inequality, for axisymmetric initial data of the Einstein-Maxwell equations, to the known maximal case whenever a geometrically motivated system of equations admits a solution. It is also shown that the same reduction argument applies to the basic inequality yielding a lower bound for the area of black holes in terms…
▽ More
We show how to reduce the general formulation of the mass-angular momentum-charge inequality, for axisymmetric initial data of the Einstein-Maxwell equations, to the known maximal case whenever a geometrically motivated system of equations admits a solution. It is also shown that the same reduction argument applies to the basic inequality yielding a lower bound for the area of black holes in terms of mass, angular momentum, and charge. This extends previous work by the authors [4] (arXiv:1401.3384), in which the role of charge was omitted. Lastly, we improve upon the hypotheses required for the mass-angular momentum-charge inequality in the maximal case.
△ Less
Submitted 2 December, 2015; v1 submitted 14 July, 2014;
originally announced July 2014.
-
Confidence Corridors for Multivariate Generalized Quantile Regression
Authors:
Shih-Kang Chao,
Katharina Proksch,
Holger Dette,
Wolfgang Härdle
Abstract:
We focus on the construction of confidence corridors for multivariate nonparametric generalized quantile regression functions. This construction is based on asymptotic results for the maximal deviation between a suitable nonparametric estimator and the true function of interest which follow after a series of approximation steps including a Bahadur representation, a new strong approximation theorem…
▽ More
We focus on the construction of confidence corridors for multivariate nonparametric generalized quantile regression functions. This construction is based on asymptotic results for the maximal deviation between a suitable nonparametric estimator and the true function of interest which follow after a series of approximation steps including a Bahadur representation, a new strong approximation theorem and exponential tail inequalities for Gaussian random fields. As a byproduct we also obtain confidence corridors for the regression function in the classical mean regression. In order to deal with the problem of slowly decreasing error in coverage probability of the asymptotic confidence corridors, which results in meager coverage for small sample sizes, a simple bootstrap procedure is designed based on the leading term of the Bahadur representation. The finite sample properties of both procedures are investigated by means of a simulation study and it is demonstrated that the bootstrap procedure considerably outperforms the asymptotic bands in terms of coverage accuracy. Finally, the bootstrap confidence corridors are used to study the efficacy of the National Supported Work Demonstration, which is a randomized employment enhancement program launched in the 1970s. This article has supplementary materials.
△ Less
Submitted 2 February, 2015; v1 submitted 17 June, 2014;
originally announced June 2014.
-
Graph Invariants Based on the Divides Relation and Ordered by Prime Signatures
Authors:
Sung-Hyuk Cha,
Edgar G. DuCasse,
Louis V. Quintas
Abstract:
Directed acyclic graphs whose nodes are all the divisors of a positive integer $n$ and arcs $(a,b)$ defined by $a$ divides $b$ are considered. Fourteen graph invariants such as order, size, and the number of paths are investigated for two classic graphs, the Hasse diagram $G^H(n)$ and its transitive closure $G^T(n)$ derived from the divides relation partial order. Concise formulae and algorithms a…
▽ More
Directed acyclic graphs whose nodes are all the divisors of a positive integer $n$ and arcs $(a,b)$ defined by $a$ divides $b$ are considered. Fourteen graph invariants such as order, size, and the number of paths are investigated for two classic graphs, the Hasse diagram $G^H(n)$ and its transitive closure $G^T(n)$ derived from the divides relation partial order. Concise formulae and algorithms are devised for these graph invariants and several important properties of these graphs are formally proven. Integer sequences of these invariants in natural order by $n$ are computed and several new sequences are identified by comparing them to existing sequences in the On-Line Encyclopedia of Integer Sequences. These new and existing integer sequences are interpreted from the graph theory point of view. Both $G^H(n)$ and $G^T(n)$ are characterized by the prime signature of $n$. Hence, two conventional orders of prime signatures, namely the graded colexicographic and the canonical orders are considered and additional new integer sequences are discovered.
△ Less
Submitted 20 May, 2014;
originally announced May 2014.
-
Deformations of Axially Symmetric Initial Data and the Mass-Angular Momentum Inequality
Authors:
Ye Sle Cha,
Marcus A. Khuri
Abstract:
We show how to reduce the general formulation of the mass-angular momentum inequality, for axisymmetric initial data of the Einstein equations, to the known maximal case whenever a geometrically motivated system of equations admits a solution. This procedure is based on a certain deformation of the initial data which preserves the relevant geometry, while achieving the maximal condition and its im…
▽ More
We show how to reduce the general formulation of the mass-angular momentum inequality, for axisymmetric initial data of the Einstein equations, to the known maximal case whenever a geometrically motivated system of equations admits a solution. This procedure is based on a certain deformation of the initial data which preserves the relevant geometry, while achieving the maximal condition and its implied inequality (in a weak sense) for the scalar curvature; this answers a question posed by R. Schoen. The primary equation involved, bears a strong resemblance to the Jang-type equations studied in the context of the positive mass theorem and the Penrose inequality. Each equation in the system is analyzed in detail individually, and it is shown that appropriate existence/uniqueness results hold with the solution satisfying desired asymptotics. Lastly, it is shown that the same reduction argument applies to the basic inequality yielding a lower bound for the area of black holes in terms of mass and angular momentum.
△ Less
Submitted 13 February, 2015; v1 submitted 14 January, 2014;
originally announced January 2014.