Skip to main content

Showing 1–50 of 69 results for author: Mondelli, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17282  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Attention with Trained Embeddings Provably Selects Important Tokens

    Authors: Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli

    Abstract: Token embeddings play a crucial role in language modeling but, despite this practical relevance, their theoretical understanding remains limited. Our paper addresses the gap by characterizing the structure of embeddings obtained via gradient descent. Specifically, we consider a one-layer softmax attention model with a linear head for binary classification, i.e.,… ▽ More

    Submitted 25 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Fix mistakes in Lemma 4.2 and proof of Lemma 4.5, and some other minor changes

  2. arXiv:2505.16329  [pdf, ps, other

    stat.ML cs.LG

    Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping

    Authors: Simone Bombari, Inbar Seroussi, Marco Mondelli

    Abstract: Differentially private (DP) linear regression has received significant attention in the recent theoretical literature, with several works aimed at obtaining improved error rates. A common approach is to set the clipping constant much larger than the expected norm of the per-sample gradients. While simplifying the analysis, this is however in sharp contrast with what empirical evidence suggests to… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  3. arXiv:2505.15239  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

    Authors: Peter Súkeník, Christoph H. Lampert, Marco Mondelli

    Abstract: The empirical emergence of neural collapse -- a surprising symmetry in the feature representations of the training data in the penultimate layer of deep neural networks -- has spurred a line of theoretical research aimed at its understanding. However, existing work focuses on data-agnostic models or, when data structure is taken into account, it remains limited to multi-layer perceptrons. Our pape… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2503.11842  [pdf, other

    cs.LG stat.ML

    Test-Time Training Provably Improves Transformers as In-context Learners

    Authors: Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak

    Abstract: Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  5. arXiv:2502.01583  [pdf, ps, other

    stat.ML cs.IT cs.LG math.PR math.ST

    Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery

    Authors: Filip Kovačević, Yihan Zhang, Marco Mondelli

    Abstract: Multi-index models provide a popular framework to investigate the learnability of functions with low-dimensional structure and, also due to their connections with neural networks, they have been object of recent intensive study. In this paper, we focus on recovering the subspace spanned by the signals via spectral estimators -- a family of methods routinely used in practice, often as a warm-start… ▽ More

    Submitted 10 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Accepted to COLT 2025

  6. arXiv:2502.01347  [pdf, other

    stat.ML cs.LG

    Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

    Authors: Simone Bombari, Marco Mondelli

    Abstract: Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature $x$ and a spurious feature $y$. Speci… ▽ More

    Submitted 27 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Revision after ICML 2025 reviews

  7. arXiv:2501.19104  [pdf, other

    cs.LG

    Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime

    Authors: Diyuan Wu, Marco Mondelli

    Abstract: Neural Collapse is a phenomenon where the last-layer representations of a well-trained neural network converge to a highly structured geometry. In this paper, we focus on its first (and most basic) property, known as NC1: the within-class variability vanishes. While prior theoretical studies establish the occurrence of NC1 via the data-agnostic unconstrained features model, our work adopts a data-… ▽ More

    Submitted 4 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 35 pages. Fix a typo in the title

  8. arXiv:2410.18837  [pdf, other

    stat.ML cs.LG

    High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

    Authors: M. Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli, Samet Oymak

    Abstract: A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for ridgeless, high-dimensional regression, under two settings: (i) model shift, where the surrogate model is arbitrary, and (ii) distribution shift, wher… ▽ More

    Submitted 27 February, 2025; v1 submitted 24 October, 2024; originally announced October 2024.

  9. arXiv:2410.14787  [pdf, other

    stat.ML cs.CR cs.LG

    Privacy for Free in the Overparameterized Regime

    Authors: Simone Bombari, Marco Mondelli

    Abstract: Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to standard GD has received remarkable attention from the research community, which formally derived upper bounds on the excess population risk $R_{P}$ in… ▽ More

    Submitted 27 May, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Update after PNAS revision

  10. arXiv:2410.04887  [pdf, other

    cs.LG math.OC stat.ML

    Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

    Authors: Arthur Jacot, Peter Súkeník, Zihan Wang, Marco Mondelli

    Abstract: Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a highly symmetric geometric structure referred to as neural collapse. This empirical evidence has spurred a line of theoretical research aimed at proving the emergence of neural collapse, mostly focusing on the unconstrained features model. Here, the features of the penultimate layer are free… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 29 pages, 5 figures

  11. arXiv:2405.20993  [pdf, other

    cs.IT cond-mat.dis-nn cs.LG math.ST

    Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise

    Authors: Jean Barbier, Francesco Camilli, Marco Mondelli, Yizhou Xu

    Abstract: We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractabili… ▽ More

    Submitted 8 July, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    MSC Class: 62F15; 82B44

  12. arXiv:2405.14468  [pdf, other

    cs.LG math.OC stat.ML

    Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?

    Authors: Peter Súkeník, Marco Mondelli, Christoph Lampert

    Abstract: Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classifica… ▽ More

    Submitted 21 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  13. arXiv:2405.13912  [pdf, other

    math.ST cs.IT cs.LG math.PR stat.ML

    Matrix Denoising with Doubly Heteroscedastic Noise: Fundamental Limits and Optimal Spectral Methods

    Authors: Yihan Zhang, Marco Mondelli

    Abstract: We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations. Existing works are either unable to pinpoint the exact asymptotic estimation error or, when they do so, the resulting approaches (e.g., based on whitening or singular value shrinkage) remain vastly suboptimal. On top of this, most of the literature… ▽ More

    Submitted 28 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2402.13728  [pdf, other

    cs.LG stat.ML

    Average gradient outer product as a mechanism for deep neural collapse

    Authors: Daniel Beaglehole, Peter Súkeník, Marco Mondelli, Mikhail Belkin

    Abstract: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to… ▽ More

    Submitted 19 January, 2025; v1 submitted 21 February, 2024; originally announced February 2024.

  15. arXiv:2402.11200  [pdf, other

    cs.IT math.FA math.PR

    Contraction of Markovian Operators in Orlicz Spaces and Error Bounds for Markov Chain Monte Carlo

    Authors: Amedeo Roberto Esposito, Marco Mondelli

    Abstract: We introduce a novel concept of convergence for Markovian processes within Orlicz spaces, extending beyond the conventional approach associated with $L_p$ spaces. After showing that Markovian operators are contractive in Orlicz spaces, our key technical contribution is an upper bound on their contraction coefficient, which admits a closed-form expression. The bound is tight in some settings, and i… ▽ More

    Submitted 11 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Full version of the work accepted for presentation at the Conference on Learning Theory (COLT) 2024

  16. arXiv:2402.05013  [pdf, other

    cs.LG cs.IT stat.ML

    Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

    Authors: Kevin Kögler, Alexander Shevchenko, Hamed Hassani, Marco Mondelli

    Abstract: Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we pro… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  17. arXiv:2402.02969  [pdf, other

    stat.ML cs.CL cs.LG

    Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features

    Authors: Simone Bombari, Marco Mondelli

    Abstract: Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed word sensitivity (WS), in the prototypical setting o… ▽ More

    Submitted 17 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Revision after ICML2024 reviews

  18. arXiv:2308.14507  [pdf, other

    math.ST cs.IT cs.LG math.PR stat.ML

    Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

    Authors: Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli

    Abstract: We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructu… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

  19. arXiv:2305.14164  [pdf, other

    cs.LG math.ST stat.ML

    Improved Convergence of Score-Based Diffusion Models via Prediction-Correction

    Authors: Francesco Pedrotti, Jan Maas, Marco Mondelli

    Abstract: Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time $T_1$ by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis pa… ▽ More

    Submitted 4 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 34 pages; accepted to TMLR

  20. arXiv:2305.13165  [pdf, other

    cs.LG stat.ML

    Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

    Authors: Peter Súkeník, Marco Mondelli, Christoph Lampert

    Abstract: Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered count… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  21. arXiv:2305.12100  [pdf, other

    stat.ML cs.LG

    How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

    Authors: Simone Bombari, Marco Mondelli

    Abstract: Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized… ▽ More

    Submitted 17 May, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Revision after ICML2024 acceptance. Motivation of the paper changed from Privacy to Spurious Features. arXiv admin note: text overlap with arXiv:2302.01629

  22. arXiv:2303.07245  [pdf, ps, other

    cs.IT math.PR

    Concentration without Independence via Information Measures

    Authors: Amedeo Roberto Esposito, Marco Mondelli

    Abstract: We propose a novel approach to concentration for non-independent random variables. The main idea is to ``pretend'' that the random variables are independent and pay a multiplicative price measuring how far they are from actually being independent. This price is encapsulated in the Hellinger integral between the joint and the product of the marginals, which is then upper bounded leveraging tensoris… ▽ More

    Submitted 30 October, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  23. arXiv:2302.03306  [pdf, other

    cs.IT cs.LG math.ST

    Mismatched estimation of non-symmetric rank-one matrices corrupted by structured noise

    Authors: Teng Fu, YuHao Liu, Jean Barbier, Marco Mondelli, ShanSuo Liang, TianQi Hou

    Abstract: We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values. As the signal-to-noise ratio and the noise structure are unknown, a Gaussian setup is incorrectly assumed. We derive the exact analytic expression for the error of the mismatched Bayes estimator and also provide… ▽ More

    Submitted 8 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  24. arXiv:2302.01629  [pdf, other

    stat.ML cs.LG

    Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

    Authors: Simone Bombari, Shayan Kiyani, Marco Mondelli

    Abstract: Machine learning models are vulnerable to adversarial perturbations, and a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through the lens of over-parameterization: interpolating smoothly the data requires significantly more parameters than simply memorizing it. However, this "universal" law provides only a necessary condition for robustness, and it is unable to discrimi… ▽ More

    Submitted 27 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Second arxiv version, updated to the icml23 version of the paper

  25. arXiv:2212.13468  [pdf, other

    cs.LG cs.IT stat.ML

    Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods

    Authors: Alexander Shevchenko, Kevin Kögler, Hamed Hassani, Marco Mondelli

    Abstract: Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging co… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

    Comments: 67 pages, 7 figures

  26. arXiv:2212.01572  [pdf, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG

    Approximate Message Passing for Multi-Layer Estimation in Rotationally Invariant Models

    Authors: Yizhou Xu, TianQi Hou, ShanSuo Liang, Marco Mondelli

    Abstract: We consider the problem of reconstructing the signal and the hidden variables from observations coming from a multi-layer network with rotationally invariant weight matrices. The multi-layer structure models inference from deep generative priors, and the rotational invariance imposed on the weights generalizes the i.i.d.\ Gaussian assumption by allowing for a complex correlation structure, which i… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

  27. arXiv:2211.11368  [pdf, other

    math.ST cs.IT cs.LG stat.ML

    Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

    Authors: Yihan Zhang, Marco Mondelli, Ramji Venkataramanan

    Abstract: In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output th… ▽ More

    Submitted 18 April, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

  28. arXiv:2211.04589  [pdf, other

    cs.LG stat.ML

    Finite Sample Identification of Wide Shallow Neural Networks with Biases

    Authors: Massimo Fornasier, Timo Klock, Marco Mondelli, Michael Rauchensteiner

    Abstract: Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    MSC Class: 65D15; 68T07; 90C26

  29. arXiv:2210.06819  [pdf, other

    cs.LG stat.ML

    Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

    Authors: Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli

    Abstract: The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties… ▽ More

    Submitted 5 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. https://openreview.net/forum?id=gZna3IiGfl

  30. arXiv:2210.01237  [pdf, other

    cs.IT cond-mat.stat-mech cs.LG stat.ML

    Bayes-optimal limits in structured PCA, and how to reach them

    Authors: Jean Barbier, Francesco Camilli, Marco Mondelli, Manuel Saenz

    Abstract: How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The r… ▽ More

    Submitted 2 June, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  31. arXiv:2205.10217  [pdf, other

    stat.ML cs.IT cs.LG

    Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

    Authors: Simone Bombari, Mohammad Hossein Amani, Marco Mondelli

    Abstract: The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $Ω(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-li… ▽ More

    Submitted 21 May, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Uniformed with the published NeurIPS 2022 version

  32. arXiv:2205.10009  [pdf, other

    cs.IT cs.LG math.ST

    The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

    Authors: Jean Barbier, TianQi Hou, Marco Mondelli, Manuel Sáenz

    Abstract: We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed? While the matched Bayes-optimal setting with unstructured noise is well understood, the analysis of this mismatched problem is only at its pr… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  33. arXiv:2205.08199  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Sharp asymptotics on the compression of two-layer neural networks

    Authors: Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana Pukdee, Stefano Rini

    Abstract: In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools… ▽ More

    Submitted 16 August, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

  34. arXiv:2201.10082  [pdf, ps, other

    cs.IT cs.DC

    Polar Coded Computing: The Role of the Scaling Exponent

    Authors: Dorsa Fathollahi, Marco Mondelli

    Abstract: We consider the problem of coded distributed computing using polar codes. The average execution time of a coded computing system is related to the error probability for transmission over the binary erasure channel in recent work by Soleymani, Jamali and Mahdavifar, where the performance of binary linear codes is investigated. In this paper, we focus on polar codes and unveil a connection between t… ▽ More

    Submitted 1 February, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  35. arXiv:2112.04330  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing

    Authors: Ramji Venkataramanan, Kevin Kögler, Marco Mondelli

    Abstract: We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited for capturing complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and r… ▽ More

    Submitted 9 June, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 35 pages, 8 figures, to appear in International Conference on Machine Learning (ICML), 2022

  36. arXiv:2112.00057  [pdf, ps, other

    cs.IT

    Successive Syndrome-Check Decoding of Polar Codes

    Authors: Seyyed Ali Hashemi, Marco Mondelli, John Cioffi, Andrea Goldsmith

    Abstract: A two-part successive syndrome-check decoding of polar codes is proposed with the first part successively refining the received codeword and the second part checking its syndrome. A new formulation of the successive-cancellation (SC) decoding algorithm is presented that allows for successively refining the received codeword by comparing the log-likelihood ratio value of a frozen bit with its prede… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: 2021 Asilomar Conference on Signals, Systems, and Computers

  37. arXiv:2111.02278  [pdf, other

    cs.LG stat.ML

    Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

    Authors: Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli

    Abstract: Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a pie… ▽ More

    Submitted 29 April, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: Accepted to the Journal of Machine Learning Research (JMLR)

  38. arXiv:2109.02122  [pdf, other

    cs.IT

    Decoding Reed-Muller Codes with Successive Codeword Permutations

    Authors: Nghia Doan, Seyyed Ali Hashemi, Marco Mondelli, Warren J. Gross

    Abstract: A novel recursive list decoding (RLD) algorithm for Reed-Muller (RM) codes based on successive permutations (SP) of the codeword is presented. A low-complexity SP scheme applied to a subset of the symmetry group of RM codes is first proposed to carefully select a good codeword permutation on the fly. Then, the proposed SP technique is integrated into an improved RLD algorithm that initializes diff… ▽ More

    Submitted 20 September, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in IEEE Transactions on Communications

  39. arXiv:2106.02356  [pdf, ps, other

    stat.ML cs.IT cs.LG math.ST

    PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

    Authors: Marco Mondelli, Ramji Venkataramanan

    Abstract: We study the problem of estimating a rank-$1$ signal in the presence of rotationally invariant noise-a class of perturbations more general than Gaussian noise. Principal Component Analysis (PCA) provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime. Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an altern… ▽ More

    Submitted 14 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: 72 pages, 2 figures, appeared in Neural Information Processing Systems (NeurIPS), 2021

  40. arXiv:2102.09671  [pdf, other

    cs.LG stat.ML

    When Are Solutions Connected in Deep Networks?

    Authors: Quynh Nguyen, Pierre Brechet, Marco Mondelli

    Abstract: The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations pro… ▽ More

    Submitted 21 October, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted at NeurIPS 2021

  41. arXiv:2012.13378  [pdf, ps, other

    cs.IT

    Parallelism versus Latency in Simplified Successive-Cancellation Decoding of Polar Codes

    Authors: Seyyed Ali Hashemi, Marco Mondelli, Arman Fazeli, Alexander Vardy, John Cioffi, Andrea Goldsmith

    Abstract: This paper characterizes the latency of the simplified successive-cancellation (SSC) decoding scheme for polar codes under hardware resource constraints. In particular, when the number of processing elements $P$ that can perform SSC decoding operations in parallel is limited, as is the case in practice, the latency of SSC decoding is $O\left(N^{1-1/μ}+\frac{N}{P}\log_2\log_2\frac{N}{P}\right)$, wh… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

  42. arXiv:2012.11654  [pdf, other

    stat.ML cs.LG

    Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

    Authors: Quynh Nguyen, Marco Mondelli, Guido Montufar

    Abstract: A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that t… ▽ More

    Submitted 21 August, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: appeared at ICML 2021, this version corrects a mistake in Lemma 5.4 which also affects Lemma 5.5. These two Lemmas have been edited and the corresponding proofs corrected. All the other results remain untouched

  43. arXiv:2011.12882  [pdf, other

    cs.IT

    Sparse Multi-Decoder Recursive Projection Aggregation for Reed-Muller Codes

    Authors: Dorsa Fathollahi, Nariman Farsad, Seyyed Ali Hashemi, Marco Mondelli

    Abstract: Reed-Muller (RM) codes are one of the oldest families of codes. Recently, a recursive projection aggregation (RPA) decoder has been proposed, which achieves a performance that is close to the maximum likelihood decoder for short-length RM codes. One of its main drawbacks, however, is the large amount of computations needed. In this paper, we devise a new algorithm to lower the computational budget… ▽ More

    Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: 6 pages, 12 figures

  44. arXiv:2010.03460  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Approximate Message Passing with Spectral Initialization for Generalized Linear Models

    Authors: Marco Mondelli, Ramji Venkataramanan

    Abstract: We consider the problem of estimating a signal from measurements obtained via a generalized linear model. We focus on estimators based on approximate message passing (AMP), a family of iterative algorithms with many appealing features: the performance of AMP in the high-dimensional limit can be succinctly characterized under suitable model assumptions; AMP can also be tailored to the empirical dis… ▽ More

    Submitted 17 February, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 38 pages, 5 figures, AISTATS 2021

  45. arXiv:2008.03326  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

    Authors: Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan

    Abstract: We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its… ▽ More

    Submitted 25 June, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: 49 pages, 6 figures

  46. arXiv:2002.07867  [pdf, other

    cs.LG stat.ML

    Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

    Authors: Quynh Nguyen, Marco Mondelli

    Abstract: Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining… ▽ More

    Submitted 17 December, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: Accepted at NeurIPS 2020

  47. arXiv:1912.10095  [pdf, other

    cs.LG stat.ML

    Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

    Authors: Alexander Shevchenko, Marco Mondelli

    Abstract: The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately conn… ▽ More

    Submitted 23 July, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: Proceedings of the 37th International Conference on Machine Learning (ICML)

  48. arXiv:1909.04892  [pdf, ps, other

    cs.IT

    Sublinear Latency for Simplified Successive Cancellation Decoding of Polar Codes

    Authors: Marco Mondelli, Seyyed Ali Hashemi, John Cioffi, Andrea Goldsmith

    Abstract: This work analyzes the latency of the simplified successive cancellation (SSC) decoding scheme for polar codes proposed by Alamdar-Yazdi and Kschischang. It is shown that, unlike conventional successive cancellation decoding, where latency is linear in the block length, the latency of SSC decoding is sublinear. More specifically, the latency of SSC decoding is $O(N^{1-1/μ})$, where $N$ is the bloc… ▽ More

    Submitted 5 September, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: 20 pages, 6 figures, presented in part at ISIT 2020 and accepted in IEEE Transactions on Wireless Communications

  49. arXiv:1909.00271  [pdf, other

    cs.DB

    Exploring Reproducibility and FAIR Principles in Data Science Using Ecological Niche Modeling as a Case Study

    Authors: Maria Luiza Mondelli, A. Townsend Peterson, Luiz M. R. Gadelha Jr

    Abstract: Reproducibility is a fundamental requirement of the scientific process since it enables outcomes to be replicated and verified. Computational scientific experiments can benefit from improved reproducibility for many reasons, including validation of results and reuse by other scientists. However, designing reproducible experiments remains a challenge and hence the need for developing methodologies… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: 10 pages, 4 figures

  50. Rate-Flexible Fast Polar Decoders

    Authors: Seyyed Ali Hashemi, Carlo Condo, Marco Mondelli, Warren J. Gross

    Abstract: Polar codes have gained extensive attention during the past few years and recently they have been selected for the next generation of wireless communications standards (5G). Successive-cancellation-based (SC-based) decoders, such as SC list (SCL) and SC flip (SCF), provide a reasonable error performance for polar codes at the cost of low decoding speed. Fast SC-based decoders, such as Fast-SSC, Fa… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.