Skip to main content

Showing 1–19 of 19 results for author: Schwab, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.15582  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Generalization vs. Specialization under Concept Shift

    Authors: Alex Nguyen, David J. Schwab, Vudtiwat Ngampruetikorn

    Abstract: Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift -- a form of distribution shift in which the input-label relationship changes… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 8 pages, 3 figures

  2. arXiv:2210.03316  [pdf, other

    stat.ME stat.AP

    Efficient and Robust Approaches for Analysis of SMARTs: Illustration using the ADAPT-R Trial

    Authors: Lina M. Montoya, Michael R. Kosorok, Elvin H. Geng, Joshua Schwab, Thomas A. Odeny, Maya L. Petersen

    Abstract: Personalized intervention strategies, in particular those that modify treatment based on a participant's own response, are a core component of precision medicine approaches. Sequential Multiple Assignment Randomized Trials (SMARTs) are growing in popularity and are specifically designed to facilitate the evaluation of sequential adaptive strategies, in particular those embedded within the SMART. A… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  3. arXiv:2208.03848  [pdf, other

    cs.IT cond-mat.stat-mech cs.LG physics.data-an stat.ML

    Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

    Authors: Vudtiwat Ngampruetikorn, David J. Schwab

    Abstract: Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize resi… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

    Comments: NeurIPS 2022

    ACM Class: H.1.1; I.2.6

  4. arXiv:2106.15737  [pdf, other

    stat.ME stat.AP stat.ML

    Two-Stage TMLE to Reduce Bias and Improve Efficiency in Cluster Randomized Trials

    Authors: Laura B. Balzer, Mark van der Laan, James Ayieko, Moses Kamya, Gabriel Chamie, Joshua Schwab, Diane V. Havlir, Maya L. Petersen

    Abstract: Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities) and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing… ▽ More

    Submitted 20 October, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: 37 pages total; main text is 17 pgs with 2 figures and 3 tables; supp material is 14 pgs with 1 figure and 5 tables

    Journal ref: Biostatistics, kxab043, December 24, 2021

  5. arXiv:2009.12789  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Optimal Representations with the Decodable Information Bottleneck

    Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

    Abstract: We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked… ▽ More

    Submitted 16 July, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Accepted at NeurIPS 2020

  6. arXiv:2003.00152  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features. Most prominent among these is the popular feature normalization technique BatchNorm, which normalizes activations and then subsequently applies a learned affine transform. In this paper, we aim to understand the role and expressive power of affine parameters use… ▽ More

    Submitted 21 March, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Published in ICLR 2021

  7. arXiv:2002.10365  [pdf, other

    cs.LG cs.NE stat.ML

    The Early Phase of Neural Network Training

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that dee… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: ICLR 2020 Camera Ready. Available on OpenReview at https://openreview.net/forum?id=Hkl1iRNFwS

  8. arXiv:2002.00025  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

    Authors: Tankut Can, Kamesh Krishnamurthy, David J. Schwab

    Abstract: Recurrent neural networks (RNNs) are powerful dynamical models for data with complex temporal structure. However, training RNNs has traditionally proved challenging due to exploding or vanishing of gradients. RNN models such as LSTMs and GRUs (and their variants) significantly mitigate these issues associated with training by introducing various types of gating units into the architecture. While t… ▽ More

    Submitted 15 June, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: 18+18 pages, 4 figures, to appear in Proceedings of Machine Learning Research Vol. 107, 2020, 1st Annual Conference on Mathematical and Scientific Machine Learning

  9. arXiv:1910.00195  [pdf, other

    cs.LG stat.ML

    How noise affects the Hessian spectrum in overparameterized neural networks

    Authors: Mingwei Wei, David J Schwab

    Abstract: Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trac… ▽ More

    Submitted 29 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

  10. arXiv:1903.02606  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Mean-field Analysis of Batch Normalization

    Authors: Mingwei Wei, James Stokes, David J Schwab

    Abstract: Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We… ▽ More

    Submitted 6 March, 2019; originally announced March 2019.

  11. arXiv:1810.03030  [pdf, other

    math.ST stat.ME

    Robust variance estimation and inference for causal effect estimation

    Authors: Linh Tran, Maya Petersen, Joshua Schwab, Mark J van der Laan

    Abstract: We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: 20 pages, 8 figures

  12. arXiv:1808.03231  [pdf, other

    stat.AP

    Statistical Analysis Plan for SEARCH Phase I: Health Outcomes among Adults

    Authors: Laura B. Balzer, Diane V. Havlir, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

    Abstract: This document provides the analytic plan for evaluating adult HIV incidence, health, and implementation outcomes for the first phase of the SEARCH Study. Locked: November 27, 2017. Embargoed until July 25, 2018.

    Submitted 25 July, 2018; originally announced August 2018.

    Comments: 40 pgs

  13. arXiv:1803.08823  [pdf, other

    physics.comp-ph cond-mat.stat-mech cs.LG stat.ML

    A high-bias, low-variance introduction to Machine Learning for physicists

    Authors: Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre G. R. Day, Clint Richardson, Charles K. Fisher, David J. Schwab

    Abstract: Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, r… ▽ More

    Submitted 27 May, 2019; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: Notebooks have been updated. 122 pages, 78 figures, 20 Python notebooks

    Journal ref: Phyics Reports 810 (2019) 1-124

  14. arXiv:1712.09657  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    The information bottleneck and geometric clustering

    Authors: DJ Strouse, David J Schwab

    Abstract: The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P\!\left(Y\mid X\right)$. This is in contrast to classic "g… ▽ More

    Submitted 31 May, 2020; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: Updated to final published version with more detailed relationship to GMMs/k-means

    Journal ref: Neural Computation 31 (2019) 596-612

  15. arXiv:1707.05861  [pdf, other

    stat.ME stat.CO stat.ML

    On Adaptive Propensity Score Truncation in Causal Inference

    Authors: Cheng Ju, Joshua Schwab, Mark J. van der Laan

    Abstract: The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of practical violations of the positivity assumption is extreme values in the estimat… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

  16. arXiv:1609.03541  [pdf, ps, other

    cond-mat.dis-nn cs.LG stat.ML

    Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]

    Authors: David J. Schwab, Pankaj Mehta

    Abstract: In a recent paper, "Why does deep and cheap learning work so well?", Lin and Tegmark claim to show that the mapping between deep belief networks and the variational renormalization group derived in [arXiv:1410.3831] is invalid, and present a "counterexample" that claims to show that this mapping does not hold. In this comment, we show that these claims are incorrect and stem from a misunderstandin… ▽ More

    Submitted 12 September, 2016; originally announced September 2016.

    Comments: Comment on arXiv:1608.08225

  17. arXiv:1605.05775  [pdf, other

    stat.ML cond-mat.str-el cs.LG

    Supervised Learning with Quantum-Inspired Tensor Networks

    Authors: E. Miles Stoudenmire, David J. Schwab

    Abstract: Tensor networks are efficient representations of high-dimensional tensors which have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing such networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize models for classifying images. For the MNIST data set we obtain less than 1% test set… ▽ More

    Submitted 18 May, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

    Comments: 11 pages, 15 figures; updated version includes corrections, links to sample codes, expanded discussion, and additional references

    Journal ref: Advances in Neural Information Processing Systems 29, 4799 (2016)

  18. arXiv:1604.00268  [pdf, other

    q-bio.NC cond-mat.stat-mech cs.IT q-bio.QM stat.ML

    The deterministic information bottleneck

    Authors: DJ Strouse, David J Schwab

    Abstract: Lossy compression and clustering fundamentally involve a decision about what features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek formalized this notion as an information-theoretic optimization problem and proposed an optimal tradeoff between throwing away as many bits as possible, and selectively keeping those that are most important. In t… ▽ More

    Submitted 19 December, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: 15 pages, 4 figures

  19. arXiv:1410.3831  [pdf, ps, other

    stat.ML cond-mat.stat-mech cs.LG cs.NE

    An exact mapping between the Variational Renormalization Group and Deep Learning

    Authors: Pankaj Mehta, David J. Schwab

    Abstract: Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relat… ▽ More

    Submitted 14 October, 2014; originally announced October 2014.

    Comments: 8 pages, 3 figures