Skip to main content

Showing 1–22 of 22 results for author: Goldblum, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.10648  [pdf, other

    cs.LG cs.CE stat.ML

    A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers

    Authors: Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, C. Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum

    Abstract: Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and… ▽ More

    Submitted 31 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 10 pages, 6 pages of references+appendix

  2. arXiv:2410.02117  [pdf, other

    cs.LG stat.ML

    Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

    Authors: Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson

    Abstract: Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are op… ▽ More

    Submitted 4 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. Code available at https://github.com/AndPotap/einsum-search

  3. arXiv:2407.18158  [pdf, other

    stat.ML cs.LG

    Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

    Authors: Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson

    Abstract: Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  4. arXiv:2406.11463  [pdf, other

    cs.LG stat.ML

    Just How Flexible are Neural Networks in Practice?

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

    Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.08391  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 5 December, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 Camera Ready

  6. arXiv:2312.17173  [pdf, other

    stat.ML cs.LG

    Non-Vacuous Generalization Bounds for Large Language Models

    Authors: Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we d… ▽ More

    Submitted 17 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  7. arXiv:2305.02997  [pdf, other

    cs.LG cs.AI stat.ML

    When Do Neural Nets Outperform Boosted Trees on Tabular Data?

    Authors: Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, Colin White

    Abstract: Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: NeurIPS Datasets and Benchmarks Track 2023

  8. arXiv:2304.05366  [pdf, other

    cs.LG stat.ML

    The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

    Authors: Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

    Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets h… ▽ More

    Submitted 7 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Published at the International Conference on Machine Learning (ICML) 2024

  9. arXiv:2211.13609  [pdf, other

    cs.LG stat.ML

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Authors: Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

    Abstract: While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tas… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-bayes

  10. arXiv:2210.02984  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Lie Derivative for Measuring Learned Equivariance

    Authors: Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, whic… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICLR 2023. Code available at: https://github.com/ngruver/lie-deriv

  11. arXiv:2206.15306  [pdf, other

    cs.LG stat.ML

    Transfer Learning with Deep Tabular Models

    Authors: Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

    Abstract: Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applica… ▽ More

    Submitted 7 August, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  12. arXiv:2202.11678  [pdf, other

    cs.LG stat.ML

    Bayesian Model Selection, the Marginal Likelihood, and Generalization

    Authors: Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson

    Abstract: How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive… ▽ More

    Submitted 1 May, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: Extended version. Shorter ICML version available at arXiv:2202.11678v2

  13. arXiv:2106.01342  [pdf, other

    cs.LG cs.AI stat.ML

    SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

    Authors: Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, Tom Goldstein

    Abstract: Tabular data underpins numerous high-impact applications of machine learning from fraud detection to genomics and healthcare. Classical approaches to solving tabular problems, such as gradient boosting and random forests, are widely used by practitioners. However, recent deep learning methods have achieved a degree of performance competitive with popular techniques. We devise a hybrid deep learnin… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  14. arXiv:2104.08894  [pdf, other

    cs.CV cs.LG stat.ML

    The Intrinsic Dimension of Images and Its Impact on Learning

    Authors: Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein

    Abstract: It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: To appear at ICLR 2021 (spotlight), 17 pages with appendix, 15 figures

    ACM Class: I.2.6; I.5.1

  15. arXiv:2011.12919  [pdf, other

    cs.LG cs.AI stat.ML

    Analyzing the Machine Learning Conference Review Process

    Authors: David Tran, Alex Valtchanov, Keshav Ganapathy, Raymond Feng, Eric Slud, Micah Goldblum, Tom Goldstein

    Abstract: Mainstream machine learning conferences have seen a dramatic increase in the number of participants, along with a growing range of perspectives, in recent years. Members of the machine learning community are likely to overhear allegations ranging from randomness of acceptance decisions to institutional bias. In this work, we critically analyze the review process through a comprehensive study of pa… ▽ More

    Submitted 25 November, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: NeurIPS Workshop on Navigating the Broader Impacts of AI Research. Full version at arXiv:2010.05137

  16. arXiv:2006.12557  [pdf, other

    cs.LG cs.CR cs.CV cs.CY stat.ML

    Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

    Authors: Avi Schwarzschild, Micah Goldblum, Arjun Gupta, John P Dickerson, Tom Goldstein

    Abstract: Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, it remains unclear exactly how dangerous poisoning methods are and which ones are more effective considering that these… ▽ More

    Submitted 17 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 19 pages, 4 figures

  17. arXiv:2002.06753  [pdf, other

    cs.LG cs.CV stat.ML

    Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks

    Authors: Micah Goldblum, Steven Reich, Liam Fowl, Renkun Ni, Valeriia Cherepanova, Tom Goldstein

    Abstract: Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little is known about why the resulting feature extractors perform so well. We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and… ▽ More

    Submitted 1 July, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  18. arXiv:1911.07989  [pdf, other

    cs.LG cs.CR cs.CV eess.SP stat.ML

    WITCHcraft: Efficient PGD attacks with random step size

    Authors: Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum, Tom Goldstein, Renkun Ni, Steven Reich, Ali Shafahi

    Abstract: State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Desc… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Comments: Authors contributed equally and are listed in alphabetical order

  19. arXiv:1910.00982  [pdf, ps, other

    cs.LG stat.ML

    Adversarially Robust Few-Shot Learning: A Meta-Learning Approach

    Authors: Micah Goldblum, Liam Fowl, Tom Goldstein

    Abstract: Previous work on adversarially robust neural networks for image classification requires large training sets and computationally expensive training procedures. On the other hand, few-shot learning methods are highly vulnerable to adversarial examples. The goal of our work is to produce networks which both perform well at few-shot classification tasks and are simultaneously robust to adversarial exa… ▽ More

    Submitted 15 October, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2020

  20. arXiv:1910.00359  [pdf, other

    cs.LG math.OC stat.ML

    Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

    Authors: Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

    Abstract: We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not confor… ▽ More

    Submitted 28 April, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: 18 pages, 6 figures. First two authors contributed equally. Published as a conference paper at ICLR 2020

  21. arXiv:1906.03291  [pdf, other

    cs.LG cs.NE stat.ML

    Understanding Generalization through Visualizations

    Authors: W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, J. K. Terry, Furong Huang, Tom Goldstein

    Abstract: The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization me… ▽ More

    Submitted 14 November, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: 8 pages (excluding acknowledgments and references), 8 figures

  22. arXiv:1905.09747  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarially Robust Distillation

    Authors: Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein

    Abstract: Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images.… ▽ More

    Submitted 2 December, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Accepted to AAAI Conference on Artificial Intelligence, 2020