Search | arXiv e-print repository

Learning on a Razor's Edge: the Singularity Bias of Polynomial Neural Networks

Authors: Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Abstract: Deep neural networks often infer sparse representations, converging to a subnetwork during the learning process. In this work, we theoretically analyze subnetworks and their bias through the lens of algebraic geometry. We consider fully-connected networks with polynomial activation functions, and focus on the geometry of the function space they parametrize, often referred to as neuromanifold. Firs… ▽ More Deep neural networks often infer sparse representations, converging to a subnetwork during the learning process. In this work, we theoretically analyze subnetworks and their bias through the lens of algebraic geometry. We consider fully-connected networks with polynomial activation functions, and focus on the geometry of the function space they parametrize, often referred to as neuromanifold. First, we compute the dimension of the subspace of the neuromanifold parametrized by subnetworks. Second, we show that this subspace is singular. Third, we argue that such singularities often correspond to critical points of the training dynamics. Lastly, we discuss convolutional networks, for which subnetworks and singularities are similarly related, but the bias does not arise. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2501.18915 [pdf, ps, other]

Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry

Authors: Giovanni Luca Marchetti, Vahid Shahverdi, Stefano Mereta, Matthew Trager, Kathlén Kohn

Abstract: In this position paper, we promote the study of function spaces parameterized by machine learning models through the lens of algebraic geometry. To this end, we focus on algebraic models, such as neural networks with polynomial activations, whose associated function spaces are semi-algebraic varieties. We outline a dictionary between algebro-geometric invariants of these varieties, such as dimensi… ▽ More In this position paper, we promote the study of function spaces parameterized by machine learning models through the lens of algebraic geometry. To this end, we focus on algebraic models, such as neural networks with polynomial activations, whose associated function spaces are semi-algebraic varieties. We outline a dictionary between algebro-geometric invariants of these varieties, such as dimension, degree, and singularities, and fundamental aspects of machine learning, such as sample complexity, expressivity, training dynamics, and implicit bias. Along the way, we review the literature and discuss ideas beyond the algebraic domain. This work lays the foundations of a research direction bridging algebraic geometry and deep learning, that we refer to as neuroalgebraic geometry. △ Less

Submitted 30 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: Published at ICML 2025

arXiv:2410.00722 [pdf, other]

On the Geometry and Optimization of Polynomial Convolutional Networks

Authors: Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

Abstract: We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we co… ▽ More We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss. △ Less

Submitted 3 March, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

Comments: Accepted at AISTATS 2025

arXiv:2409.04868 [pdf, other]

Moment Constraints and Phase Recovery for Multireference Alignment

Authors: Vahid Shahverdi, Emanuel Ström, Joakim Andén

Abstract: Multireference alignment (MRA) refers to the problem of recovering a signal from noisy samples subject to random circular shifts. Expectation maximization (EM) and variational approaches use statistical modeling to achieve high accuracy at the cost of solving computationally expensive optimization problems. The method of moments, instead, achieves fast reconstructions by utilizing the power spectr… ▽ More Multireference alignment (MRA) refers to the problem of recovering a signal from noisy samples subject to random circular shifts. Expectation maximization (EM) and variational approaches use statistical modeling to achieve high accuracy at the cost of solving computationally expensive optimization problems. The method of moments, instead, achieves fast reconstructions by utilizing the power spectrum and bispectrum to determine the signal up to shift. Our approach combines the two philosophies by viewing the power spectrum as a manifold on which to constrain the signal. We then maximize the data likelihood function on this manifold with a gradient-based approach to estimate the true signal. Algorithmically, our method involves iterating between template alignment and projections onto the manifold. The method offers increased speed compared to EM and demonstrates improved accuracy over bispectrum-based methods. △ Less

Submitted 7 September, 2024; originally announced September 2024.

Comments: 24 pages, 10 figures

MSC Class: 94A12; 92C55; 62F12; 68U10; 90C30; 58C25; 58E05

arXiv:2401.16613 [pdf, ps, other]

Algebraic Complexity and Neurovariety of Linear Convolutional Networks

Authors: Vahid Shahverdi

Abstract: In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold… ▽ More In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold. Furthermore, we explore the algebraic complexity of training these networks employing tools from metric algebraic geometry. Our findings reveal that the number of all complex critical points in the optimization of such a network is equal to the generic Euclidean distance degree of a Segre variety. Notably, this count significantly surpasses the number of critical points encountered in the training of a fully connected linear network with the same number of parameters. △ Less

Submitted 29 January, 2024; originally announced January 2024.

MSC Class: 68T07; 14E99; 14J99; 14P10; 90C23

arXiv:2309.13736 [pdf, other]

Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups

Authors: Kathlén Kohn, Anna-Laura Sattelberger, Vahid Shahverdi

Abstract: The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. We describe such equivariant or invariant subvarieties as direct products of determinantal va… ▽ More The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. We describe such equivariant or invariant subvarieties as direct products of determinantal varieties, from which we deduce their dimension, degree, Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks in terms of sparsity and weight-sharing properties. We prove that all invariant linear functions can be parameterized by a single linear autoencoder with a weight-sharing property imposed by the cycle decomposition of the considered permutation. The space of rank-bounded equivariant functions has several irreducible components, so it can not be parameterized by a single network-but each irreducible component can. Finally, we show that minimizing the squared-error loss on our invariant or equivariant networks reduces to minimizing the Euclidean distance from determinantal varieties via the Eckart-Young theorem. △ Less

Submitted 10 January, 2025; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: 42 pages, 8 figures, 1 table; comments welcome!

arXiv:2304.05752 [pdf, other]

Function Space and Critical Points of Linear Convolutional Networks

Authors: Kathlén Kohn, Guido Montúfar, Vahid Shahverdi, Matthew Trager

Abstract: We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameteriz… ▽ More We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points. We also describe the critical points of the network's parameterization map. Furthermore, we study the optimization problem of training a network with the squared error loss. We prove that for architectures where all strides are larger than one and generic data, the non-zero critical points of that optimization problem are smooth interior points of the function space. This property is known to be false for dense linear networks and linear convolutional networks with stride one. △ Less

Submitted 26 January, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: 35 pages, 1 figure, 2 tables

MSC Class: 68T07; 14B05; 14E99; 14J99; 14N05; 14P10; 90C23

Showing 1–7 of 7 results for author: Shahverdi, V