Skip to main content

Showing 1–9 of 9 results for author: Galanti, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2206.05794  [pdf, other

    cs.LG stat.ML

    SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

    Authors: Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

    Abstract: We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias toward rank minimization in the weight matrices. Specifically, we show both theoretically and empirically that this bias becomes more pronounced with smal… ▽ More

    Submitted 18 October, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

  2. arXiv:2003.12193  [pdf, other

    cs.LG stat.ML

    On Infinite-Width Hypernetworks

    Authors: Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang

    Abstract: {\em Hypernetworks} are architectures that produce the weights of a task-specific {\em primary network}. A notable application of hypernetworks in the recent literature involves learning to output functional representations. In these scenarios, the hypernetwork learns a representation corresponding to the weights of a shallow MLP, which typically encodes shape or image information. While such repr… ▽ More

    Submitted 22 February, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: The first two authors contributed equally

  3. arXiv:2002.10007  [pdf, other

    cs.LG cs.AI stat.ML

    A Critical View of the Structural Causal Model

    Authors: Tomer Galanti, Ofir Nabati, Lior Wolf

    Abstract: In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  4. arXiv:2002.10006  [pdf, other

    cs.LG stat.ML

    On the Modularity of Hypernetworks

    Authors: Tomer Galanti, Lior Wolf

    Abstract: In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $θ_I$ of the function $h_I(x) = g(x;θ_I)$ ar… ▽ More

    Submitted 2 November, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

    Comments: Accepted to Advances in Neural Information Processing Systems (NeurIPS) 2020

  5. arXiv:2001.10460  [pdf, other

    cs.LG stat.ML

    On Random Kernels of Residual Architectures

    Authors: Etai Littwin, Tomer Galanti, Lior Wolf

    Abstract: We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our analysis reveals that finite size residual architectures are initialized much closer to the "kernel regime" than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix the depth, while increasing the layers' width. Our fi… ▽ More

    Submitted 17 June, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

  6. arXiv:2001.05207  [pdf, ps, other

    cs.LG stat.ML

    A Formal Approach to Explainability

    Authors: Lior Wolf, Tomer Galanti, Tamir Hazan

    Abstract: We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these properties and between explanation-generating functions and intermediate representations of learned models and are able to show, for example, that if the activations of… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Journal ref: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, January 2019, Pages 255-261

  7. arXiv:2001.05026  [pdf, other

    cs.LG stat.ML

    Unsupervised Learning of the Set of Local Maxima

    Authors: Lior Wolf, Sagie Benaim, Tomer Galanti

    Abstract: This paper describes a new form of unsupervised learning, whose input is a set of unlabeled points that are assumed to be local maxima of an unknown value function v in an unknown subset of the vector space. Two functions are learned: (i) a set indicator c, which is a binary classifier, and (ii) a comparator function h that given two nearby samples, predicts which sample has the higher value of th… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: ICLR 2019

  8. arXiv:1807.08501  [pdf, other

    cs.LG stat.ML

    Risk Bounds for Unsupervised Cross-Domain Mapping with IPMs

    Authors: Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with adversarial training methods based on IPMs and derive a novel risk bound, which upper bounds the risk between the lear… ▽ More

    Submitted 2 November, 2020; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: arXiv admin note: text overlap with arXiv:1709.00074

  9. arXiv:1703.01606  [pdf, ps, other

    cs.LG stat.ML

    A Theory of Output-Side Unsupervised Domain Adaptation

    Authors: Tomer Galanti, Lior Wolf

    Abstract: When learning a mapping from an input space to an output space, the assumption that the sample distribution of the training data is the same as that of the test data is often violated. Unsupervised domain shift methods adapt the learned function in order to correct for this shift. Previous work has focused on utilizing unlabeled samples from the target distribution. We consider the complementary p… ▽ More

    Submitted 5 March, 2017; originally announced March 2017.