Skip to main content

Showing 1–26 of 26 results for author: Poggio, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.05300  [pdf, other

    cs.LG cond-mat.dis-nn cs.AI stat.ML

    Parameter Symmetry Potentially Unifies Deep Learning Theory

    Authors: Liu Ziyin, Yizhou Xu, Tomaso Poggio, Isaac Chuang

    Abstract: The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial… ▽ More

    Submitted 23 May, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: preprint

  2. arXiv:2411.13733  [pdf, ps, other

    cs.LG stat.ML

    On Generalization Bounds for Neural Networks with Low Rank Layers

    Authors: Andrea Pinto, Akshay Rangamani, Tomaso Poggio

    Abstract: While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule for Gaussian complexity to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: Published in the MIT DSpace repository: https://dspace.mit.edu/handle/1721.1/157263

  3. arXiv:2406.11110  [pdf, other

    cs.LG math.OC stat.ML

    How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

    Authors: Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio

    Abstract: We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input. In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit re… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 34 pages, 19 figures

  4. arXiv:2212.12675  [pdf, other

    stat.ML cs.LG math.OC

    Iterative regularization in classification via hinge loss diagonal descent

    Authors: Vassilis Apidopoulos, Tomaso Poggio, Lorenzo Rosasco, Silvia Villa

    Abstract: Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularizat… ▽ More

    Submitted 9 October, 2024; v1 submitted 24 December, 2022; originally announced December 2022.

  5. arXiv:2206.05794  [pdf, other

    cs.LG stat.ML

    SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

    Authors: Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

    Abstract: We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias toward rank minimization in the weight matrices. Specifically, we show both theoretically and empirically that this bias becomes more pronounced with smal… ▽ More

    Submitted 18 October, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

  6. arXiv:2107.10199  [pdf, other

    cs.LG cs.AI stat.ML

    Distribution of Classification Margins: Are All Data Equal?

    Authors: Andrzej Banburski, Fernanda De La Torre, Nishka Pant, Ishana Shastri, Tomaso Poggio

    Abstract: Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the ar… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: Previously online as CBMM Memo 115 on the CBMM MIT site

  7. arXiv:2101.00072  [pdf, other

    cs.LG stat.ML

    Explicit regularization and implicit bias in deep network classifiers trained with the square loss

    Authors: Tomaso Poggio, Qianli Liao

    Abstract: Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks. We provide here a theoretical justification based on analysis of the associated gradient flow. We show that convergence to a solution with the absolute minimum norm is expected when normalization techniques such as Batch Normalization (BN) or Weight Normalization (WN) are used together with… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  8. arXiv:2006.16427  [pdf, other

    cs.LG cs.CV stat.ML

    Biologically Inspired Mechanisms for Adversarial Robustness

    Authors: Manish V. Reddy, Andrzej Banburski, Nishka Pant, Tomaso Poggio

    Abstract: A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically p… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: 25 pages, 15 figures

  9. arXiv:2006.15522  [pdf, other

    stat.ML cs.LG

    For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

    Authors: Akshay Rangamani, Lorenzo Rosasco, Tomaso Poggio

    Abstract: We study the average $\mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $\mbox{CV}_{loo}$ stability, which in turn is controlled by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and car… ▽ More

    Submitted 11 October, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

  10. arXiv:2006.13915  [pdf, other

    cs.LG eess.IV q-bio.NC stat.ML

    Hierarchically Compositional Tasks and Deep Convolutional Networks

    Authors: Arturo Deza, Qianli Liao, Andrzej Banburski, Tomaso Poggio

    Abstract: The main success stories of deep learning, starting with ImageNet, depend on deep convolutional networks, which on certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines, and also better than deep fully connected networks; but what is so special about deep convolutional networks? Recent results in approximation theory proved an exponential… ▽ More

    Submitted 25 March, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: A pre-print. Currently Under Review

    Report number: MIT Center for Brains, Minds and Machines (CBMM) Memo #109

  11. arXiv:1912.06190  [pdf, other

    cs.LG stat.ML

    Double descent in the condition number

    Authors: Tomaso Poggio, Gil Kur, Andrzej Banburski

    Abstract: In solving a system of $n$ linear equations in $d$ variables $Ax=b$, the condition number of the $n,d$ matrix $A$ measures how much errors in the data $b$ affect the solution $x$. Estimates of this type are important in many inverse problems. An example is machine learning where the key task is to estimate an underlying function from a set of measurements at random points in a high dimensional spa… ▽ More

    Submitted 28 April, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Removed parts relating to kernel regression to streamline the presentation, fixed some typos

  12. arXiv:1908.09375  [pdf, other

    cs.LG stat.ML

    Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

    Authors: Tomaso Poggio, Andrzej Banburski, Qianli Liao

    Abstract: While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected err… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: text overlap with arXiv:1611.00740

  13. arXiv:1905.12882  [pdf, other

    cs.LG stat.ML

    Function approximation by deep networks

    Authors: H. N. Mhaskar, T. Poggio

    Abstract: We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On th… ▽ More

    Submitted 23 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: To appear in Communications in pure and applied mathematics

  14. arXiv:1903.04991  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Theory III: Dynamics and Generalization in Deep Networks

    Authors: Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio

    Abstract: The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that a classical form of norm control -- but kind of hidden -- is present in deep networks trained with gradient descent techniques on exponential-type losses. In pa… ▽ More

    Submitted 10 April, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: 47 pages, 11 figures. This replaces previous versions of Theory III, that appeared on Arxiv [arXiv:1806.11379, arXiv:1801.00173] or on the CBMM site. v5: Changes throughout the paper to the presentation and tightening some of the statements

  15. arXiv:1811.03567  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Biologically-plausible learning algorithms can scale to large datasets

    Authors: Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio

    Abstract: The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetr… ▽ More

    Submitted 20 December, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

  16. arXiv:1807.09659  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    A Surprising Linear Relationship Predicts Test Performance in Deep Networks

    Authors: Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio

    Abstract: Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that ha… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

  17. arXiv:1806.11379  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Theory IIIb: Generalization in Deep Networks

    Authors: Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary

    Abstract: A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent re… ▽ More

    Submitted 29 June, 2018; originally announced June 2018.

    Comments: 38 pages, 7 figures

  18. arXiv:1806.04542  [pdf, other

    stat.ML cs.LG

    Approximate inference with Wasserstein gradient flows

    Authors: Charlie Frogner, Tomaso Poggio

    Abstract: We present a novel approximate inference method for diffusion processes, based on the Wasserstein gradient flow formulation of the diffusion. In this formulation, the time-dependent density of the diffusion is derived as the limit of implicit Euler steps that follow the gradients of a particular free energy functional. Existing methods for computing Wasserstein gradient flows rely on discretizatio… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

  19. arXiv:1711.01530  [pdf, other

    cs.LG cs.AI stat.ML

    Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

    Authors: Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes

    Abstract: We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity --- the Fisher-Rao norm --- that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequaliti… ▽ More

    Submitted 23 February, 2019; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: To appear in the proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019

    Journal ref: The 22nd International Conference on Artificial Intelligence and Statistics 89 (2019) 888-896

  20. arXiv:1510.04935  [pdf, other

    cs.AI cs.LG stat.ML

    Holographic Embeddings of Knowledge Graphs

    Authors: Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio

    Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correl… ▽ More

    Submitted 7 December, 2015; v1 submitted 16 October, 2015; originally announced October 2015.

    Comments: To appear in AAAI-16

    ACM Class: I.2.6; I.2.4

  21. arXiv:1506.05439  [pdf, other

    cs.LG cs.CV stat.ML

    Learning with a Wasserstein Loss

    Authors: Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, Tomaso Poggio

    Abstract: Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact… ▽ More

    Submitted 29 December, 2015; v1 submitted 17 June, 2015; originally announced June 2015.

    Comments: NIPS 2015; v3 updates Algorithm 1 and Equations 6, 8

  22. arXiv:1506.02544  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Learning with Group Invariant Features: A Kernel Perspective

    Authors: Youssef Mroueh, Stephen Voinea, Tomaso Poggio

    Abstract: We analyze in this paper a random feature map based on a theory of invariance I-theory introduced recently. More specifically, a group invariant signal signature is obtained through cumulative distributions of group transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that i… ▽ More

    Submitted 4 December, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: NIPS 2015

  23. arXiv:1404.0400  [pdf, other

    cs.SD cs.LG stat.ML

    A Deep Representation for Invariance And Music Classification

    Authors: Chiyuan Zhang, Georgios Evangelopoulos, Stephen Voinea, Lorenzo Rosasco, Tomaso Poggio

    Abstract: Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical arch… ▽ More

    Submitted 1 April, 2014; originally announced April 2014.

    Comments: 5 pages, CBMM Memo No. 002, (to appear) IEEE 2014 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014)

    Report number: CBMM Memo No. 002

  24. arXiv:1303.5976  [pdf, ps, other

    stat.ML cs.LG

    On Learnability, Complexity and Stability

    Authors: Silvia Villa, Lorenzo Rosasco, Tomaso Poggio

    Abstract: We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik. We survey classic results characterizing learnability in term of suitable notions of complexity, as well as more recent results that establish the connection between learnability and stability of a learning algorithm.

    Submitted 24 March, 2013; originally announced March 2013.

  25. arXiv:1209.1360  [pdf, other

    stat.ML cs.LG

    Multiclass Learning with Simplex Coding

    Authors: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-Jacques Slotine

    Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, a relaxation error analysis can be developed avoiding constraints on the considered hypotheses class. Moreover, we show that in th… ▽ More

    Submitted 14 September, 2012; v1 submitted 6 September, 2012; originally announced September 2012.

  26. arXiv:1209.1121  [pdf, other

    cs.LG stat.ML

    Learning Manifolds with K-Means and K-Flats

    Authors: Guillermo D. Canas, Tomaso Poggio, Lorenzo Rosasco

    Abstract: We study the problem of estimating a manifold from random samples. In particular, we consider piecewise constant and piecewise linear estimators induced by k-means and k-flats, and analyze their performance. We extend previous results for k-means in two separate directions. First, we provide new results for k-means reconstruction on manifolds and, secondly, we prove reconstruction bounds for highe… ▽ More

    Submitted 19 February, 2013; v1 submitted 5 September, 2012; originally announced September 2012.

    Comments: 19 pages, 2 figures; Advances in Neural Information Processing Systems, NIPS 2012

    ACM Class: K.3.2