-
Adaptive Robustness of Hypergrid Johnson-Lindenstrauss
Authors:
Andrej Bogdanov,
Alon Rosen,
Neekon Vafa,
Vinod Vaikuntanathan
Abstract:
Johnson and Lindenstrauss (Contemporary Mathematics, 1984) showed that for $n > m$, a scaled random projection $\mathbf{A}$ from $\mathbb{R}^n$ to $\mathbb{R}^m$ is an approximate isometry on any set $S$ of size at most exponential in $m$. If $S$ is larger, however, its points can contract arbitrarily under $\mathbf{A}$. In particular, the hypergrid $([-B, B] \cap \mathbb{Z})^n$ is expected to con…
▽ More
Johnson and Lindenstrauss (Contemporary Mathematics, 1984) showed that for $n > m$, a scaled random projection $\mathbf{A}$ from $\mathbb{R}^n$ to $\mathbb{R}^m$ is an approximate isometry on any set $S$ of size at most exponential in $m$. If $S$ is larger, however, its points can contract arbitrarily under $\mathbf{A}$. In particular, the hypergrid $([-B, B] \cap \mathbb{Z})^n$ is expected to contain a point that is contracted by a factor of $κ_{\mathsf{stat}} = Θ(B)^{-1/α}$, where $α= m/n$.
We give evidence that finding such a point exhibits a statistical-computational gap precisely up to $κ_{\mathsf{comp}} = \widetildeΘ(\sqrtα/B)$. On the algorithmic side, we design an online algorithm achieving $κ_{\mathsf{comp}}$, inspired by a discrepancy minimization algorithm of Bansal and Spencer (Random Structures & Algorithms, 2020). On the hardness side, we show evidence via a multiple overlap gap property (mOGP), which in particular captures online algorithms; and a reduction-based lower bound, which shows hardness under standard worst-case lattice assumptions.
As a cryptographic application, we show that the rounded Johnson-Lindenstrauss embedding is a robust property-preserving hash function (Boyle, Lavigne and Vaikuntanathan, TCC 2019) on the hypergrid for the Euclidean metric in the computationally hard regime. Such hash functions compress data while preserving $\ell_2$ distances between inputs up to some distortion factor, with the guarantee that even knowing the hash function, no computationally bounded adversary can find any pair of points that violates the distortion bound.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Policy Learning with Confidence
Authors:
Victor Chernozhukov,
Sokbae Lee,
Adam M. Rosen,
Liyang Sun
Abstract:
This paper introduces a framework for selecting policies that maximize expected welfare under estimation uncertainty. The proposed method explicitly balances the size of the estimated welfare against the uncertainty inherent in its estimation, ensuring that chosen policies meet a reporting guarantee, namely, that actual welfare is guaranteed not to fall below the reported estimate with a pre-speci…
▽ More
This paper introduces a framework for selecting policies that maximize expected welfare under estimation uncertainty. The proposed method explicitly balances the size of the estimated welfare against the uncertainty inherent in its estimation, ensuring that chosen policies meet a reporting guarantee, namely, that actual welfare is guaranteed not to fall below the reported estimate with a pre-specified confidence level. We produce the efficient decision frontier, describing policies that offer maximum estimated welfare for a given acceptable level of estimation risk. We apply this approach to a variety of settings, including the selection of policy rules that allocate individuals to treatments and the allocation of limited budgets among competing social programs.
△ Less
Submitted 9 July, 2025; v1 submitted 14 February, 2025;
originally announced February 2025.
-
Learning and Predicting from Dynamic Models for COVID-19 Patient Monitoring
Authors:
Zitong Wang,
Mary Grace Bowring,
Antony Rosen,
Brian T. Garibaldi,
Akihiko Nishimura,
Scott L. Zeger
Abstract:
COVID-19 has challenged health systems to learn how to learn. This paper describes the context, methods and challenges for learning to improve COVID-19 care at one academic health center. Challenges to learning include: (1) choosing a right clinical target; (2) designing methods for accurate predictions by borrowing strength from prior patients' experiences; (3) communicating the methodology to cl…
▽ More
COVID-19 has challenged health systems to learn how to learn. This paper describes the context, methods and challenges for learning to improve COVID-19 care at one academic health center. Challenges to learning include: (1) choosing a right clinical target; (2) designing methods for accurate predictions by borrowing strength from prior patients' experiences; (3) communicating the methodology to clinicians so they understand and trust it; (4) communicating the predictions to the patient at the moment of clinical decision; and (5) continuously evaluating and revising the methods so they adapt to changing patients and clinical demands. To illustrate these challenges, this paper contrasts two statistical modeling approaches - prospective longitudinal models in common use and retrospective analogues complementary in the COVID-19 context - for predicting future biomarker trajectories and major clinical events. The methods are applied to and validated on a cohort of 1,678 patients who were hospitalized with COVID-19 during the early months of the pandemic. We emphasize graphical tools to promote physician learning and inform clinical decision making.
△ Less
Submitted 21 March, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
A Bayesian Approach to Restricted Latent Class Models for Scientifically-Structured Clustering of Multivariate Binary Outcomes
Authors:
Zhenke Wu,
Livia Casciola-Rosen,
Antony Rosen,
Scott L. Zeger
Abstract:
In this paper, we propose a general framework for combining evidence of varying quality to estimate underlying binary latent variables in the presence of restrictions imposed to respect the scientific context. The resulting algorithms cluster the multivariate binary data in a manner partly guided by prior knowledge. The primary model assumptions are that 1) subjects belong to classes defined by un…
▽ More
In this paper, we propose a general framework for combining evidence of varying quality to estimate underlying binary latent variables in the presence of restrictions imposed to respect the scientific context. The resulting algorithms cluster the multivariate binary data in a manner partly guided by prior knowledge. The primary model assumptions are that 1) subjects belong to classes defined by unobserved binary states, such as the true presence or absence of pathogens in epidemiology, or of antibodies in medicine, or the "ability" to correctly answer test questions in psychology, 2) a binary design matrix $Γ$ specifies relevant features in each class, and 3) measurements are independent given the latent class but can have different error rates. Conditions ensuring parameter identifiability from the likelihood function are discussed and inform the design of a novel posterior inference algorithm that simultaneously estimates the number of clusters, design matrix $Γ$, and model parameters. In finite samples and dimensions, we propose prior assumptions so that the posterior distribution of the number of clusters and the patterns of latent states tend to concentrate on smaller values and sparser patterns, respectively. The model readily extends to studies where some subjects' latent classes are known or important prior knowledge about differential measurement accuracy is available from external sources. The methods are illustrated with an analysis of protein data to detect clusters representing auto-antibody classes among scleroderma patients.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Faster Family-wise Error Control for Neuroimaging with a Parametric Bootstrap
Authors:
Simon N. Vandekar,
Theodore D. Satterthwaite,
Adon Rosen,
Rastko Ciric,
David R. Roalf,
Kosha Ruparel,
Ruben C. Gur,
Raquel E. Gur,
Russell T. Shinohara
Abstract:
In neuroimaging, hundreds to hundreds of thousands of tests are performed across a set of brain regions or all locations in an image. Recent studies have shown that the most common family-wise error (FWE) controlling procedures in imaging, which rely on classical mathematical inequalities or Gaussian random field theory, yield FWE rates that are far from the nominal level. Depending on the approac…
▽ More
In neuroimaging, hundreds to hundreds of thousands of tests are performed across a set of brain regions or all locations in an image. Recent studies have shown that the most common family-wise error (FWE) controlling procedures in imaging, which rely on classical mathematical inequalities or Gaussian random field theory, yield FWE rates that are far from the nominal level. Depending on the approach used, the FWER can be exceedingly small or grossly inflated. Given the widespread use of neuroimaging as a tool for understanding neurological and psychiatric disorders, it is imperative that reliable multiple testing procedures are available. To our knowledge, only permutation joint testing procedures have been shown to reliably control the FWER at the nominal level. However, these procedures are computationally intensive due to the increasingly available large sample sizes and dimensionality of the images, and analyses can take days to complete. Here, we develop a parametric bootstrap joint testing procedure. The parametric bootstrap procedure works directly with the test statistics, which leads to much faster estimation of adjusted \emph{p}-values than resampling-based procedures while reliably controlling the FWER in sample sizes available in many neuroimaging studies. We demonstrate that the procedure controls the FWER in finite samples using simulations, and present region- and voxel-wise analyses to test for sex differences in developmental trajectories of cerebral blood flow.
△ Less
Submitted 18 August, 2017; v1 submitted 16 August, 2017;
originally announced August 2017.
-
Estimating AutoAntibody Signatures to Detect Autoimmune Disease Patient Subsets
Authors:
Zhenke Wu,
Livia Casciola-Rosen,
Ami A. Shah,
Antony Rosen,
Scott Zeger
Abstract:
Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically…
▽ More
Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of radiolabeled, immunoprecipitated proteins on sodium dodecyl sulfate (SDS) gels and autoradiography [$\textbf{G}$el $\textbf{E}$lectrophoresis and band detection on $\textbf{A}$utoradiograms (GEA)]. Human recognition of patterns is not optimal when the patterns are complex or scattered across many samples. Multiple sources of errors - including irrelevant intensity differences and warping of gels - have challenged automation of pattern discovery from autoradiograms.
In this paper, we address these limitations using a Bayesian hierarchical model with shrinkage priors for pattern alignment and spatial dewarping. The Bayesian model combines information from multiple gel sets and corrects spatial warping for coherent estimation of autoantibody signatures defined by presence or absence of a grid of landmark proteins. We show the pre-processing creates more clearly separated clusters and improves the accuracy of autoantibody subset detection via hierarchical clustering. Finally, we demonstrate the utility of the proposed methods with GEA data from scleroderma patients.
△ Less
Submitted 4 August, 2017; v1 submitted 18 April, 2017;
originally announced April 2017.