Skip to main content

Showing 1–28 of 28 results for author: Janson, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.18657  [pdf, other

    stat.ML cs.LG eess.SY

    Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: $\sqrt{T}$-Regret

    Authors: Benjamin Schiffer, Lucas Janson

    Abstract: Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement learning is difficult due to the complex interaction between safety, exploration, and exploitation. In this work, we seek to establish foundations for safety-constrain… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  2. arXiv:2502.05319  [pdf, other

    stat.ME

    Semiparametric Inference for Partially Identifiable Data Fusion Estimands via Double Machine Learning

    Authors: Yicong Jiang, Lucas Janson

    Abstract: Many statistical estimands of interest (e.g., in regression or causality) are functions of the joint distribution of multiple random variables. But in some applications, data is not available that measures all random variables on each subject, and instead the only possible approach is one of data fusion, where multiple independent data sets, each measuring a subset of the random variables of inter… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 45 pages, 6 figures

  3. arXiv:2501.00566  [pdf, other

    stat.ME math.ST

    Compositional Covariate Importance Testing via Partial Conjunction of Bivariate Hypotheses

    Authors: Ritwik Bhaduri, Siyuan Ma, Lucas Janson

    Abstract: Compositional data (i.e., data comprising random variables that sum up to a constant) arises in many applications including microbiome studies, chemical ecology, political science, and experimental designs. Yet when compositional data serve as covariates in a regression, the sum constraint renders every covariate automatically conditionally independent of the response given the other covariates, s… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  4. arXiv:2410.21081  [pdf, other

    stat.ML cs.LG eess.SY

    Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: Generalized Baselines

    Authors: Benjamin Schiffer, Lucas Janson

    Abstract: Many practical applications of online reinforcement learning require the satisfaction of safety constraints while learning about the unknown environment. In this work, we establish theoretical foundations for reinforcement learning with safety constraints by studying the canonical problem of Linear Quadratic Regulator learning with unknown dynamics, but with the additional constraint that the posi… ▽ More

    Submitted 29 April, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  5. arXiv:2406.18390  [pdf, other

    stat.ME

    The $\ell$-test: leveraging sparsity in the Gaussian linear model for improved inference

    Authors: Souhardya Sengupta, Lucas Janson

    Abstract: We develop novel LASSO-based methods for coefficient testing and confidence interval construction in the Gaussian linear model with $n\ge d$. Our methods' finite-sample guarantees are identical to those of their ubiquitous ordinary-least-squares-$t$-test-based analogues, yet have substantially higher power when the true coefficient vector is sparse. In particular, our coefficient test, which we ca… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.17748  [pdf, other

    cs.LG math.OC stat.ML

    A New Perspective on Shampoo's Preconditioner

    Authors: Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham Kakade, Lucas Janson

    Abstract: Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connec… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2402.11771  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Evaluating the Effectiveness of Index-Based Treatment Allocation

    Authors: Niclas Boehmer, Yash Nair, Sanket Shah, Lucas Janson, Aparna Taneja, Milind Tambe

    Abstract: When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  8. arXiv:2402.04933  [pdf, other

    cs.LG stat.AP

    Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

    Authors: Biyonka Liang, Lily Xu, Aparna Taneja, Milind Tambe, Lucas Janson

    Abstract: Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requi… ▽ More

    Submitted 5 February, 2025; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 29 pages, 18 figures

  9. arXiv:2309.04002  [pdf, other

    stat.ME

    Total Variation Floodgate for Variable Importance Inference in Classification

    Authors: Wenshuo Wang, Lucas Janson, Lihua Lei, Aaditya Ramdas

    Abstract: Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  10. arXiv:2301.05365  [pdf, other

    stat.ME

    Randomization Tests for Adaptively Collected Data

    Authors: Yash Nair, Lucas Janson

    Abstract: Randomization testing is a fundamental method in statistics, enabling inferential tasks such as testing for (conditional) independence of random variables, constructing confidence intervals in semiparametric location models, and constructing (by inverting a permutation test) model-free prediction intervals via conformal inference. Randomization tests are exactly valid for any sample size, but thei… ▽ More

    Submitted 19 March, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

  11. arXiv:2212.11304  [pdf, other

    stat.ME

    Powerful Partial Conjunction Hypothesis Testing via Conditioning

    Authors: Biyonka Liang, Lu Zhang, Lucas Janson

    Abstract: A Partial Conjunction Hypothesis (PCH) test combines information across a set of base hypotheses to determine whether some subset is non-null. PCH tests arise in a diverse array of fields, but standard PCH testing methods can be highly conservative, leading to low power especially in low signal settings commonly encountered in applications. In this paper, we introduce the conditional PCH (cPCH) te… ▽ More

    Submitted 15 May, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

  12. arXiv:2208.05885  [pdf, other

    stat.ME stat.CO

    Surrogate-based global sensitivity analysis with statistical guarantees via floodgate

    Authors: Massimo Aufiero, Lucas Janson

    Abstract: Computational models are utilized in many scientific domains to simulate complex systems. Sensitivity analysis is an important practice to aid our understanding of the mechanics of these models and the processes they describe, but performing a sufficient number of model evaluations to obtain accurate sensitivity estimates can often be prohibitively expensive. In order to reduce the computational b… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

  13. arXiv:2203.17208  [pdf, other

    stat.ME

    Controlled Discovery and Localization of Signals via Bayesian Linear Programming

    Authors: Asher Spector, Lucas Janson

    Abstract: Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems aris… ▽ More

    Submitted 28 January, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 62 pages, 22 figures. Changes v4: slightly shortened introduction/modified abstract

    MSC Class: 62F15 (Primary); 60G35; 62F25; 62F15; 62J12; 85A35; 92D10 (Secondary)

  14. arXiv:2202.07098  [pdf, ps, other

    cs.LG stat.ME

    Statistical Inference After Adaptive Sampling for Longitudinal Data

    Authors: Kelly W. Zhang, Lucas Janson, Susan A. Murphy

    Abstract: Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "p… ▽ More

    Submitted 19 April, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Fixing typos

  15. arXiv:2201.08343  [pdf, other

    stat.ME stat.ML

    Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

    Authors: Dae Woong Ham, Kosuke Imai, Lucas Janson

    Abstract: Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor… ▽ More

    Submitted 17 August, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Journal ref: Polit. Anal. 32 (2024) 329-344

  16. arXiv:2011.14625  [pdf, other

    stat.ME

    Powerful Knockoffs via Minimizing Reconstructability

    Authors: Asher Spector, Lucas Janson

    Abstract: Model-X knockoffs allows analysts to perform feature selection using almost any machine learning algorithm while still provably controlling the expected proportion of false discoveries. To apply model-X knockoffs, one must construct synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize th… ▽ More

    Submitted 28 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: 72 pages, 25 figures

  17. arXiv:2007.12671  [pdf, other

    stat.ML cs.LG math.ST

    Cross-validation Confidence Intervals for Test Error

    Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey

    Abstract: This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for $k$-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller $k$-fold test error than another.… ▽ More

    Submitted 31 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020); 40 pages, 15 figures

  18. arXiv:2007.09851  [pdf, other

    stat.ME

    Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling

    Authors: Rina Foygel Barber, Lucas Janson

    Abstract: Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for com… ▽ More

    Submitted 14 September, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

  19. arXiv:2007.01283  [pdf, other

    stat.ME

    Floodgate: inference for model-free variable importance

    Authors: Lu Zhang, Lucas Janson

    Abstract: Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing \emph{whether} $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the \emph{strength} of that dependence. We first define our es… ▽ More

    Submitted 12 September, 2022; v1 submitted 2 July, 2020; originally announced July 2020.

  20. arXiv:2006.03980  [pdf, other

    stat.ME

    Fast and Powerful Conditional Randomization Testing via Distillation

    Authors: Molei Liu, Eugene Katsevich, Lucas Janson, Aaditya Ramdas

    Abstract: We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test the null hypothesis that Y is independent of X given Z. The conditional randomization test (CRT) was recently proposed as a way to use distributional information about X|Z to exactly (non-asymptotically) control Type-I error using any test statistic in any dimensionality without assuming a… ▽ More

    Submitted 4 June, 2021; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: This paper has been merged with a parallel work arXiv:2006.08482 by Eugene Katsevich and Aaditya Ramdas

  21. arXiv:2002.03217  [pdf, other

    cs.LG stat.ML

    Inference for Batched Bandits

    Authors: Kelly W. Zhang, Lucas Janson, Susan A. Murphy

    Abstract: As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asympto… ▽ More

    Submitted 8 January, 2021; v1 submitted 8 February, 2020; originally announced February 2020.

    Journal ref: NeurIPS 2020

  22. arXiv:1903.02806  [pdf, ps, other

    stat.ME

    Relaxing the Assumptions of Knockoffs by Conditioning

    Authors: Dongming Huang, Lucas Janson

    Abstract: The recent paper Candès et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independent… ▽ More

    Submitted 12 June, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

    MSC Class: 62G10; 62B05; 62J02

  23. Metropolized Knockoff Sampling

    Authors: Stephen Bates, Emmanuel Candès, Lucas Janson, Wenshuo Wang

    Abstract: Model-X knockoffs is a wrapper that transforms essentially any feature importance measure into a variable selection algorithm, which discovers true effects while rigorously controlling the expected fraction of false positives. A frequently discussed challenge to apply this method is to construct knockoff variables, which are synthetic variables obeying a crucial exchangeability property with the e… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

    Journal ref: Journal of the American Statistical Association, 116:535, 1413-1427, 2021

  24. arXiv:1610.02351  [pdf, other

    stat.ME math.ST stat.AP

    Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection

    Authors: Emmanuel Candes, Yingying Fan, Lucas Janson, Jinchi Lv

    Abstract: Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in high-dimensional logistic regression, not to mentio… ▽ More

    Submitted 12 December, 2017; v1 submitted 7 October, 2016; originally announced October 2016.

    Comments: 39 pages, 10 figures, 2 tables

  25. arXiv:1505.06549  [pdf, other

    stat.ME

    Familywise Error Rate Control via Knockoffs

    Authors: Lucas Janson, Weijie Su

    Abstract: We present a novel method for controlling the $k$-familywise error rate ($k$-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testi… ▽ More

    Submitted 9 November, 2015; v1 submitted 25 May, 2015; originally announced May 2015.

    Comments: 15 pages, 3 figures. Updated references

  26. arXiv:1505.02097  [pdf, other

    stat.ME

    EigenPrism: Inference for High-Dimensional Signal-to-Noise Ratios

    Authors: Lucas Janson, Rina Foygel Barber, Emmanuel Candès

    Abstract: Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional ($p>n$) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied p… ▽ More

    Submitted 28 June, 2016; v1 submitted 8 May, 2015; originally announced May 2015.

  27. arXiv:1312.7851  [pdf, other

    stat.OT stat.ME

    Effective Degrees of Freedom: A Flawed Metaphor

    Authors: Lucas Janson, William Fithian, Trevor Hastie

    Abstract: To most applied statisticians, a fitting procedure's degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, it is often used to parameterize the bias-variance tradeoff in model selection. We argue that, contrary to folk intuition, model complexity and degrees of freedom are not synonymous and may correspond very poorly. We exhibit and th… ▽ More

    Submitted 13 July, 2014; v1 submitted 30 December, 2013; originally announced December 2013.

  28. arXiv:1308.5736  [pdf, other

    stat.ME

    A Methodology for Robust Multiproxy Paleoclimate Reconstructions and Modeling of Temperature Conditional Quantiles

    Authors: Lucas Janson, Bala Rajaratnam

    Abstract: Great strides have been made in the field of reconstructing past temperatures based on models relating temperature to temperature-sensitive paleoclimate proxies. One of the goals of such reconstructions is to assess if current climate is anomalous in a millennial context. These regression based approaches model the conditional mean of the temperature distribution as a function of paleoclimate prox… ▽ More

    Submitted 26 August, 2013; originally announced August 2013.

    MSC Class: 62J05

    Journal ref: Journal of the American Statistical Association 109 (2014) 63-77