Skip to main content

Showing 1–50 of 521 results for author: Nathan

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.09887  [pdf, ps, other

    cs.LG math.ST stat.ML

    Learning single-index models via harmonic decomposition

    Authors: Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz, Nathan Srebro

    Abstract: We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}_*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite e… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 80 pages

  2. arXiv:2506.05391  [pdf, ps, other

    eess.IV cs.CV cs.LG stat.AP

    Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction

    Authors: Ambrose Emmett-Iwaniw, Nathan Kirk

    Abstract: Autoregressive models are often employed to learn distributions of image data by decomposing the $D$-dimensional density function into a product of one-dimensional conditional distributions. Each conditional depends on preceding variables (pixels, in the case of image data), making the order in which variables are processed fundamental to the model performance. In this paper, we study the problem… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in conference proceedings, MCQMC 2024

  3. arXiv:2506.02881  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Simulation-Based Inference for Adaptive Experiments

    Authors: Brian M Cho, Aurélien Bibaut, Nathan Kallus

    Abstract: Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials due to their potential to improve outcomes for study participants, enable faster identification of the best-performing options, and/or enhance the precision of estimating key parameters. Current approaches for inference after adaptive sampling either rely on asymptotic normality under restricted ex… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  4. arXiv:2505.19087  [pdf, ps, other

    cs.LG stat.ML

    Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

    Authors: Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry

    Abstract: We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $θ_0 \sim p_0$. We focus on Langevin dynamics with a positive temperature $β^{-1}$, i.e. gradient descent on a training loss $L$ with infinitesimal step size, perturbed with $β^{-1}$-vari… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  5. arXiv:2505.17468  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Efficient Adaptive Experimentation with Non-Compliance

    Authors: Miruna Oprescu, Brian M Cho, Nathan Kallus

    Abstract: We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable. Building on semiparametric efficiency theory, we derive the efficiency bound for ATE estimation under arbitrary, history-dependent instrument-assignment policies, and show it is minimized by a varian… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 26 pages, 3 figures

  6. arXiv:2505.09805  [pdf, ps, other

    q-bio.QM cs.AI cs.LG stat.AP

    Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

    Authors: Aditya Nagori, Ayush Gautam, Matthew O. Wiens, Vuong Nguyen, Nathan Kenya Mugisha, Jerome Kabakyenga, Niranjan Kissoon, John Mark Ansermino, Rishikesan Kamaleswaran

    Abstract: Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 record… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 11 pages, 2 Figures, 1 Table

  7. arXiv:2505.09803  [pdf, ps, other

    stat.ML cs.LG

    LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

    Authors: Antony Sikorski, Michael Ivanitskiy, Nathan Lenssen, Douglas Nychka, Daniel McKenzie

    Abstract: In many scientific and industrial applications, we are given a handful of instances (a 'small ensemble') of a spatially distributed quantity (a 'field') but would like to acquire many more. For example, a large ensemble of global temperature sensitivity fields from a climate model can help farmers, insurers, and governments plan appropriately. When acquiring more data is prohibitively expensive --… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2505.09004  [pdf, ps, other

    stat.ML cs.LG

    Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features

    Authors: Monica Welfert, Nathan Stromberg, Mario Diaz, Lalitha Sankar

    Abstract: We propose an adversarial evaluation framework for sensitive feature inference based on minimum mean-squared error (MMSE) estimation with a finite sample size and linear predictive models. Our approach establishes theoretical lower bounds on the true MMSE of inferring sensitive features from noisy observations of other correlated features. These bounds are expressed in terms of the empirical MMSE… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: submitted to IEEE Transactions on Information Theory

  9. arXiv:2505.07729  [pdf, ps, other

    stat.ME math.ST stat.ML

    Nonparametric Instrumental Variable Inference with Many Weak Instruments

    Authors: Lars van der Laan, Nathan Kallus, Aurélien Bibaut

    Abstract: We study inference on linear functionals in the nonparametric instrumental variable (NPIV) problem with a discretely-valued instrument under a many-weak-instruments asymptotic regime, where the number of instrument values grows with the sample size. A key motivating example is estimating long-term causal effects in a new experiment with only short-term outcomes, using past experiments to instrumen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  10. arXiv:2505.06806  [pdf, ps, other

    math.DS stat.ML

    Kernel Dynamic Mode Decomposition For Sparse Reconstruction of Closable Koopman Operators

    Authors: Nishant Panda, Himanshu Singh, J. Nathan Kutz

    Abstract: Spatial temporal reconstruction of dynamical system is indeed a crucial problem with diverse applications ranging from climate modeling to numerous chaotic and physical processes. These reconstructions are based on the harmonious relationship between the Koopman operators and the choice of dictionary, determined implicitly by a kernel function. This leads to the approximation of the Koopman operat… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    MSC Class: 86A08; 47N60; 47N70; 46E22; 47B32; 46E20; 46E22; 70G60; 76F20

  11. arXiv:2505.05857  [pdf, ps, other

    cs.LG math.OC stat.ML

    Mixed-Integer Optimization for Responsible Machine Learning

    Authors: Nathan Justin, Qingshi Sun, Andrés Gómez, Phebe Vayanos

    Abstract: In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to, and society as a whole raises critical concerns around fairness, transparenc… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 56 pages, 10 figures

  12. arXiv:2505.04062  [pdf, other

    stat.CO math.NA

    Multilevel Sampling in Algebraic Statistics

    Authors: Nathan Kirk, Ivan Gvozdanović, Sonja Petrović

    Abstract: This paper proposes a multilevel sampling algorithm for fiber sampling problems in algebraic statistics, inspired by Henry Wynn's suggestion to adapt multilevel Monte Carlo (MLMC) ideas to discrete models. Focusing on log-linear models, we sample from high-dimensional lattice fibers defined by algebraic constraints. Building on Markov basis methods and results from Diaconis and Sturmfels, our algo… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 21 pages, 7 figures

    MSC Class: 62R01 (Primary) 62-08; 52B20 (Secondary)

  13. arXiv:2505.00835  [pdf, other

    stat.AP cs.LG stat.ML

    Multi-site modelling and reconstruction of past extreme skew surges along the French Atlantic coast

    Authors: Nathan Huet, Philippe Naveau, Anne Sabourin

    Abstract: Appropriate modelling of extreme skew surges is crucial, particularly for coastal risk management. Our study focuses on modelling extreme skew surges along the French Atlantic coast, with a particular emphasis on investigating the extremal dependence structure between stations. We employ the peak-over-threshold framework, where a multivariate extreme event is defined whenever at least one location… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  14. arXiv:2504.04579  [pdf, other

    cs.LG stat.ML

    From Continual Learning to SGD and Back: Better Rates for Continual Linear Models

    Authors: Itay Evron, Ran Levinstein, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Nathan Srebro

    Abstract: We theoretically study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. For continual linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop no… ▽ More

    Submitted 27 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  15. arXiv:2503.19220  [pdf, other

    physics.geo-ph stat.AP

    Stochastic ecohydrological perspective on semi-distributed rainfall-runoff dynamics

    Authors: Mark S. Bartlett, Elizabeth Cultra, Nathan Geldner, Amilcare Porporato

    Abstract: Quantifying watershed process variability consistently with climate change and ecohydrological dynamics remains a central challenge in hydrology. Stochastic ecohydrology characterizes hydrologic variability through probability distributions that link climate, hydrology, and ecology. However, these approaches are often limited to small spatial scales (e.g., point or plot level) or focus on specific… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Keypoints: 1) The model unifies stochastic ecohydrology, semi-distributed hydrology, and the SCS-CN method; 2) By merging the SCS-CN method with stochastic ecohydrology, antecedent conditions in SCS-CN are linked to hydroclimate; 3) Calibration to 81 USGS gages shows the model accurately captures runoff variability and water balance aligned with Budyko-type curves. 44 pages, 16 figures

  16. arXiv:2503.12760  [pdf, other

    stat.ML cs.LG econ.EM

    SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

    Authors: Brian Cho, Ana-Roxana Pop, Ariel Evnine, Nathan Kallus

    Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail outcomes. To provide credible recommendations, experimenters must not only identify policies that satisfy the desir… ▽ More

    Submitted 21 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  17. arXiv:2503.12012  [pdf, other

    cs.LG math.OC stat.ML

    Mixed-feature Logistic Regression Robust to Distribution Shifts

    Authors: Qingshi Sun, Nathan Justin, Andres Gomez, Phebe Vayanos

    Abstract: Logistic regression models are widely used in the social and behavioral sciences and in high-stakes domains, due to their simplicity and interpretability properties. At the same time, such domains are permeated by distribution shifts, where the distribution generating the data changes between training and deployment. In this paper, we study a distributionally robust logistic regression problem tha… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: The 28th International Conference on Artificial Intelligence and Statistics (AISTATS), 2025

  18. arXiv:2503.07932  [pdf, other

    stat.ML cs.AI cs.CC cs.LG

    A Theory of Learning with Autoregressive Chain of Thought

    Authors: Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, Nathan Srebro

    Abstract: For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the final token as the answer. We formalize the learning problems both when the chain-of-thought is observed and when training only on prompt-answer pairs, with the… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Comments are welcome

  19. arXiv:2503.05910  [pdf, other

    stat.AP

    Interactive Visualization Framework for Forensic Bullet Comparisons

    Authors: Nathan Rethwisch, Heike Hofmann

    Abstract: The current method for forensic analysis of bullet comparison relies on manual examination by forensic examiners to determine if bullets were discharged from the same firearm. This process is highly subjective, prompting the development of algorithmic methods to provide objective statistical support for comparisons. However, a gap exists between the technical understanding of these algorithms and… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 6 pages, 14 figures, awarded the Student Paper Award in the Statistical Computing and Statistical Graphics Sections of the American Statistical Association (ASA) in 2025

  20. arXiv:2503.02877  [pdf, other

    cs.LG stat.ML

    Weak-to-Strong Generalization Even in Random Feature Networks, Provably

    Authors: Marko Medvedev, Kaifeng Lyu, Dingli Yu, Sanjeev Arora, Zhiyuan Li, Nathan Srebro

    Abstract: Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  21. arXiv:2503.02110  [pdf, other

    stat.ML cs.LG

    Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification

    Authors: Xiaohan Zhu, Nathan Srebro

    Abstract: We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Grunwald and Langford [2004] previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a p… ▽ More

    Submitted 10 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  22. arXiv:2502.18611  [pdf, other

    math.PR cs.LG stat.ML

    Tight Bounds on the Binomial CDF, and the Minimum of i.i.d Binomials, in terms of KL-Divergence

    Authors: Xiaohan Zhu, Mesrob I. Ohannessian, Nathan Srebro

    Abstract: We provide finite sample upper and lower bounds on the Binomial tail probability which are a direct application of Sanov's theorem. We then use these to obtain high probability upper and lower bounds on the minimum of i.i.d. Binomial random variables. Both bounds are finite sample, asymptotically tight, and expressed in terms of the KL-divergence.

    Submitted 25 February, 2025; originally announced February 2025.

  23. arXiv:2502.10485  [pdf, other

    stat.ML cs.AI cs.LG math.ST stat.AP stat.ME

    Forecasting time series with constraints

    Authors: Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, Yannig Goude

    Abstract: Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for integrating and combining linear constraints in… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  24. Evaluating Decision Rules Across Many Weak Experiments

    Authors: Winston Chou, Colin Gray, Nathan Kallus, Aurélien Bibaut, Simon Ejdemyr

    Abstract: Technology firms conduct randomized controlled experiments ("A/B tests") to learn which actions to take to improve business outcomes. In firms with mature experimentation platforms, experimentation programs can consist of many thousands of tests. To effectively scale experimentation, firms rely on decision rules: standard operating procedures for mapping the results of an experiment to a choice of… ▽ More

    Submitted 29 May, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25), August 3--7, 2025, Toronto, ON, Canada

  25. arXiv:2502.06173  [pdf, other

    cs.LG cs.AI cs.CL stat.AP stat.ML

    Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis

    Authors: Sanket Jantre, Tianle Wang, Gilchan Park, Kriti Chopra, Nicholas Jeon, Xiaoning Qian, Nathan M. Urban, Byung-Jun Yoon

    Abstract: Identification of protein-protein interactions (PPIs) helps derive cellular mechanistic understanding, particularly in the context of complex conditions such as neurodegenerative disorders, metabolic syndromes, and cancer. Large Language Models (LLMs) have demonstrated remarkable potential in predicting protein structures and interactions via automated mining of vast biomedical literature; yet the… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  26. arXiv:2502.05295  [pdf, other

    cs.LG stat.ME

    GST-UNet: Spatiotemporal Causal Inference with Time-Varying Confounders

    Authors: Miruna Oprescu, David K. Park, Xihaier Luo, Shinjae Yoo, Nathan Kallus

    Abstract: Estimating causal effects from spatiotemporal data is a key challenge in fields such as public health, social policy, and environmental science, where controlled experiments are often infeasible. However, existing causal inference methods relying on observational data face significant limitations: they depend on strong structural assumptions to address spatiotemporal challenges $\unicode{x2013}$ s… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 17 pages, 6 figures, 2 tables

  27. arXiv:2502.03644  [pdf, other

    math.NA stat.CO

    Quasi-Monte Carlo Methods: What, Why, and How?

    Authors: Fred J. Hickernell, Nathan Kirk, Aleksei G. Sorokin

    Abstract: Many questions in quantitative finance, uncertainty quantification, and other disciplines are answered by computing the population mean, $μ:= \mathbb{E}(Y)$, where instances of $Y:=f(\boldsymbol{X})$ may be generated by numerical simulation and $\boldsymbol{X}$ has a simple probability distribution. The population mean can be approximated by the sample mean,… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    MSC Class: 65C05

  28. arXiv:2501.16489   

    stat.ML cs.LG eess.SY

    Nonparametric Sparse Online Learning of the Koopman Operator

    Authors: Boya Hou, Sina Sanjari, Nathan Dahlin, Alec Koppel, Subhonmesh Bose

    Abstract: The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. In this paper, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and explore the mis-specified scenari… ▽ More

    Submitted 4 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: This work was intended as a replacement of arXiv:2405.07432 and any subsequent updates will appear there

  29. arXiv:2501.11868  [pdf, other

    stat.ME math.ST stat.ML

    Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

    Authors: Lars van der Laan, Aurelien Bibaut, Nathan Kallus, Alex Luedtke

    Abstract: We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating debiased estimation and inference procedures in causal inference and semiparametric statistics, our framework enables practitioners to construct valid estimators for… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  30. arXiv:2501.06926  [pdf, other

    stat.ML cs.LG stat.ME

    Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference

    Authors: Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aurélien Bibaut

    Abstract: Estimating long-term causal effects from short-term data is essential for decision-making in healthcare, economics, and industry, where long-term follow-up is often infeasible. Markov Decision Processes (MDPs) offer a principled framework for modeling outcomes as sequences of states, actions, and rewards over time. We introduce a semiparametric extension of Double Reinforcement Learning (DRL) for… ▽ More

    Submitted 27 April, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  31. arXiv:2501.04903  [pdf

    stat.ML cs.LG

    Towards understanding the bias in decision trees

    Authors: Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

    Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. In this study, we show that this belief is not necessarily correct for decision trees, and that their bias can actually be in the opposite direction. Motivated by a rece… ▽ More

    Submitted 28 February, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  32. arXiv:2412.16209  [pdf

    cs.LG stat.ML

    Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

    Authors: Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

    Abstract: Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data. One wa… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  33. arXiv:2412.14318  [pdf, other

    math.DS math.NA stat.ML

    Long-time accuracy of ensemble Kalman filters for chaotic and machine-learned dynamical systems

    Authors: Daniel Sanz-Alonso, Nathan Waniorek

    Abstract: Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains s… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 40 pages, 4 figures

    MSC Class: 62F15; 68Q25; 60G35; 62M05

  34. arXiv:2412.05726  [pdf, other

    stat.ML cs.LG

    Proximal Iteration for Nonlinear Adaptive Lasso

    Authors: Nathan Wycoff, Lisa O. Singh, Ali Arab, Katharine M. Donato

    Abstract: Augmenting a smooth cost function with an $\ell_1$ penalty allows analysts to efficiently conduct estimation and variable selection simultaneously in sophisticated models and can be efficiently implemented using proximal gradient methods. However, one drawback of the $\ell_1$ penalty is bias: nonzero parameters are underestimated in magnitude, motivating techniques such as the Adaptive Lasso which… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: Some of these results were previously presented in the Technical Report at arXiv:2211.05089

  35. arXiv:2410.19092  [pdf, other

    cs.LG stat.ML

    Provable Tempered Overfitting of Minimal Nets and Typical Nets

    Authors: Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, Daniel Soudry

    Abstract: We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circui… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 60 pages, 4 figures

  36. arXiv:2410.18144  [pdf

    stat.ME cs.LG

    Using Platt's scaling for calibration after undersampling -- limitations and how to address them

    Authors: Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

    Abstract: When modelling data where the response is dichotomous and highly imbalanced, response-based sampling where a subset of the majority class is retained (i.e., undersampling) is often used to create more balanced training datasets prior to modelling. However, the models fit to this undersampled data, which we refer to as base models, generate predictions that are severely biased. There are several ca… ▽ More

    Submitted 4 December, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

  37. arXiv:2410.17398  [pdf, other

    stat.CO math.PR math.ST stat.ME

    Sacred and Profane: from the Involutive Theory of MCMC to Helpful Hamiltonian Hacks

    Authors: Nathan E. Glatt-Holtz, Andrew J. Holbrook, Justin A. Krometis, Cecilia F. Mondaini, Ami Sheth

    Abstract: In the first edition of this Handbook, two remarkable chapters consider seemingly distinct yet deeply connected subjects ...

    Submitted 29 October, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: To appear in the Handbook of MCMC, 2nd Edition

  38. arXiv:2410.15564  [pdf, other

    cs.LG stat.ME stat.ML

    Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits

    Authors: Brian Cho, Dominik Meier, Kyra Gan, Nathan Kallus

    Abstract: In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickl… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  39. arXiv:2410.09282  [pdf, other

    stat.ME math.ST

    Anytime-Valid Continuous-Time Confidence Processes for Inhomogeneous Poisson Processes

    Authors: Michael Lindon, Nathan Kallus

    Abstract: Motivated by monitoring the arrival of incoming adverse events such as customer support calls or crash reports from users exposed to an experimental product change, we consider sequential hypothesis testing of continuous-time inhomogeneous Poisson point processes. Specifically, we provide an interval-valued confidence process $C^α(t)$ over continuous time $t$ for the cumulative arrival rate… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  40. arXiv:2409.17466  [pdf, other

    stat.ML cs.AI cs.LG

    Adjusting Regression Models for Conditional Uncertainty Calibration

    Authors: Ruijiang Gao, Mingzhang Yin, James McInerney, Nathan Kallus

    Abstract: Conformal Prediction methods have finite-sample distribution-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Machine Learning Special Issue on Uncertainty Quantification

  41. arXiv:2409.13786  [pdf, other

    stat.ML cs.LG math.ST

    Physics-informed kernel learning

    Authors: Nathan Doumèche, Francis Bach, Gérard Biau, Claire Boyer

    Abstract: Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. Building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minim… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  42. arXiv:2409.12799  [pdf, ps, other

    stat.ML cs.LG math.ST

    The Central Role of the Loss Function in Reinforcement Learning

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algo… ▽ More

    Submitted 4 April, 2025; v1 submitted 19 September, 2024; originally announced September 2024.

    Comments: Accepted to Statistical Science

  43. arXiv:2409.03891  [pdf, other

    cs.LG stat.ML

    Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

    Authors: Marko Medvedev, Gal Vardi, Nathan Srebro

    Abstract: We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null pr… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  44. arXiv:2409.00582  [pdf, other

    cs.SE stat.ME

    CRUD-Capable Mobile Apps with R and shinyMobile: a Case Study in Rapid Prototyping

    Authors: Nathan Henry

    Abstract: "Harden" is a Progressive Web Application (PWA) for Ecological Momentary Assessment (EMA) developed mostly in R, which runs on all platforms with an internet connection, including iOS and Android. It leverages the shinyMobile package for creating a reactive mobile user interface (UI), PostgreSQL for the database backend, and Google Cloud Run for scalable hosting in the cloud, with serverless execu… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures

    MSC Class: 92Cxx (primary) 90-04 (secondary) ACM Class: D.2; J.3; J.4

  45. arXiv:2408.12004  [pdf, other

    cs.LG stat.ME stat.ML

    CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

    Authors: Brian M Cho, Ana-Roxana Pop, Kyra Gan, Sam Corbett-Davies, Israel Nir, Ariel Evnine, Nathan Kallus

    Abstract: When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  46. arXiv:2407.11927  [pdf, other

    stat.ML cs.LG stat.AP

    Bayesian Causal Forests for Longitudinal Data: Assessing the Impact of Part-Time Work on Growth in High School Mathematics Achievement

    Authors: Nathan McJames, Ann O'Shea, Andrew Parnell

    Abstract: Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian non-parametric methods have recently become popular… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 25 pages, 7 figures, 3 tables

  47. arXiv:2406.06452  [pdf, other

    stat.ME cs.LG stat.ML

    Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

    Authors: Miruna Oprescu, Nathan Kallus

    Abstract: Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since the treatments of interest often cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to leverage instrumental v… ▽ More

    Submitted 1 November, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages, 4 figures, NeurIPS 2024

  48. arXiv:2406.04981  [pdf, other

    cs.LG stat.ML

    The Price of Implicit Bias in Adversarially Robust Generalization

    Authors: Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe

    Abstract: We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  49. arXiv:2405.20954  [pdf, ps, other

    cs.LG stat.ML

    Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

    Authors: Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez, Nathan Tsoi

    Abstract: Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitl… ▽ More

    Submitted 26 May, 2025; v1 submitted 31 May, 2024; originally announced May 2024.

  50. arXiv:2405.20573  [pdf, other

    cs.LG q-bio.BM q-bio.QM stat.ML

    Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational Autoencoders

    Authors: A N M Nafiz Abeer, Sanket Jantre, Nathan M Urban, Byung-Jun Yoon

    Abstract: In recent years, deep generative models have been successfully adopted for various molecular design tasks, particularly in the life and material sciences. A critical challenge for pre-trained generative molecular design (GMD) models is to fine-tune them to be better suited for downstream design tasks aimed at optimizing specific molecular properties. However, redesigning and training an existing e… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.