Search | arXiv e-print repository

How regularization affects the geometry of loss functions

Authors: Nathaniel Bottman, Y. Cooper, Antonio Lerario

Abstract: What neural networks learn depends fundamentally on the geometry of the underlying loss function. We study how different regularizers affect the geometry of this function. One of the most basic geometric properties of a smooth function is whether it is Morse or not. For nonlinear deep neural networks, the unregularized loss function $L$ is typically not Morse. We consider several different regular… ▽ More What neural networks learn depends fundamentally on the geometry of the underlying loss function. We study how different regularizers affect the geometry of this function. One of the most basic geometric properties of a smooth function is whether it is Morse or not. For nonlinear deep neural networks, the unregularized loss function $L$ is typically not Morse. We consider several different regularizers, including weight decay, and study for which regularizers the regularized function $L_ε$ becomes Morse. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 16 pages, 0 figures

arXiv:2005.04210 [pdf, other]

The critical locus of overparameterized neural networks

Authors: Y. Cooper

Abstract: Many aspects of the geometry of loss functions in deep learning remain mysterious. In this paper, we work toward a better understanding of the geometry of the loss function $L$ of overparameterized feedforward neural networks. In this setting, we identify several components of the critical locus of $L$ and study their geometric properties. For networks of depth $\ell \geq 4$, we identify a locus o… ▽ More Many aspects of the geometry of loss functions in deep learning remain mysterious. In this paper, we work toward a better understanding of the geometry of the loss function $L$ of overparameterized feedforward neural networks. In this setting, we identify several components of the critical locus of $L$ and study their geometric properties. For networks of depth $\ell \geq 4$, we identify a locus of critical points we call the star locus $S$. Within $S$ we identify a positive-dimensional sublocus $C$ with the property that for $p \in C$, $p$ is a degenerate critical point, and no existing theoretical result guarantees that gradient descent will not converge to $p$. For very wide networks, we build on earlier work and show that all critical points of $L$ are degenerate, and give lower bounds on the number of zero eigenvalues of the Hessian at each critical point. For networks that are both deep and very wide, we compare the growth rates of the zero eigenspaces of the Hessian at all the different families of critical points that we identify. The results in this paper provide a starting point to a more quantitative understanding of the properties of various components of the critical locus of $L$. △ Less

Submitted 17 May, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:2005.03051 [pdf, other]

A comparison of group testing architectures for COVID-19 testing

Authors: J. Batson, N. Bottman, Y. Cooper, F. Janda

Abstract: An important component of every country's COVID-19 response is fast and efficient testing - to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a substanti… ▽ More An important component of every country's COVID-19 response is fast and efficient testing - to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a substantial and inexpensive expansion of testing capacity. In this note, we compare several popular group testing schemes in the context of qPCR testing for COVID-19. We find that in practical settings, for identification of individuals with COVID-19, Dorfman testing is the best choice at prevalences up to 30%, while for estimation of COVID-19 prevalence rates in the total population, Gibbs-Gower testing is the best choice at prevalences up to 30% given a fixed and relatively small number of tests. For instance, at a prevalence of up to 2%, Dorfman testing gives an efficiency gain of 3.5--8; at 1% prevalence, Gibbs-Gower testing gives an efficiency gain of 18, even when capping the pool size at a feasible number . This note is intended as a helpful handbook for labs implementing group testing methods. △ Less

Submitted 23 October, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 19 pages, 4 figures

arXiv:1809.05527 [pdf, ps, other]

Gradient descent in higher codimension

Authors: Y. Cooper

Abstract: We consider the behavior of gradient flow and of discrete and noisy gradient descent. It is commonly noted that the addition of noise to the process of discrete gradient descent can affect the trajectory of gradient descent. In previous work, we observed such effects. There, we considered the case where the minima had codimension 1. In this note, we do some computer experiments and observe the beh… ▽ More We consider the behavior of gradient flow and of discrete and noisy gradient descent. It is commonly noted that the addition of noise to the process of discrete gradient descent can affect the trajectory of gradient descent. In previous work, we observed such effects. There, we considered the case where the minima had codimension 1. In this note, we do some computer experiments and observe the behavior of noisy gradient descent in the more complex setting of minima of higher codimension. △ Less

Submitted 18 April, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

arXiv:1808.04839 [pdf, other]

Gradient descent in some simple settings

Authors: Y. Cooper

Abstract: In this note, we observe the behavior of gradient flow and discrete and noisy gradient descent in some simple settings. It is commonly noted that addition of noise to gradient descent can affect the trajectory of gradient descent. Here, we run some computer experiments for gradient descent on some simple functions, and observe this principle in some concrete examples. In this note, we observe the behavior of gradient flow and discrete and noisy gradient descent in some simple settings. It is commonly noted that addition of noise to gradient descent can affect the trajectory of gradient descent. Here, we run some computer experiments for gradient descent on some simple functions, and observe this principle in some concrete examples. △ Less

Submitted 18 April, 2019; v1 submitted 14 August, 2018; originally announced August 2018.

arXiv:1804.10200 [pdf, ps, other]

The loss landscape of overparameterized neural networks

Authors: Y Cooper

Abstract: We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $\mathbb{R}^n$ to $\mathbb{R}$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not loo… ▽ More We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $\mathbb{R}^n$ to $\mathbb{R}$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has $n$ parameters and is trained on $d$ data points, with $n>d$, we show that the locus $M$ of global minima of $L$ is usually not discrete, but rather an $n-d$ dimensional submanifold of $\mathbb{R}^n$. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that $M$ is typically a very high-dimensional subset of $\mathbb{R}^n$. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: 9 pages

arXiv:1709.01159 [pdf, ps, other]

A Fock Space approach to Severi Degrees of Hirzebruch Surfaces

Authors: Yaim Cooper

Abstract: The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through some general points in a surface. In this paper we study Severi degrees as well as several types of Gromov-Witten invariants of the Hirzebruch surfaces $F_k$, and the relationship between these numbers. To each Hirzebruch surface $F_k$ we associate an operator… ▽ More The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through some general points in a surface. In this paper we study Severi degrees as well as several types of Gromov-Witten invariants of the Hirzebruch surfaces $F_k$, and the relationship between these numbers. To each Hirzebruch surface $F_k$ we associate an operator $\mathsf{M}_{F_k} \in \mathcal{H}[\mathbb{P}^1]$ acting on the Fock space $\mathcal{F}[\mathbb{P}^1]$. Generating functions for each of the curve-counting theories we study here on $F_k$ can be expressed in terms of the exponential of the single operator $\mathsf{M}_{F_k}$, and counts on $\mathbb{P}^2$ can be expressed in terms of the exponential of $\mathsf{M}_{F_1}$. Several previous results can be recovered in this framework, including the recursion of Caporaso and Harris for enumerative curve counting on $\mathbb{P}^2$, the generalization by Vakil to $F_k$, and the relationship of Abramovich-Bertram between the enumerative curve counts on $F_0$ and $F_2$. We prove an analog of Abramovich-Bertram for $F_1$ and $F_3$. We also obtain two differential equations satisfied by generating functions of relative Gromov-Witten invariants on $F_k$. One of these recovers the differential equation of Getzler and Vakil. △ Less

Submitted 31 August, 2017; originally announced September 2017.

Comments: This article shares several definitions in common with arXiv:1210.8062

MSC Class: 14N10; 14N35

arXiv:1210.8062 [pdf, ps, other]

doi 10.1112/plms.12017

A Fock space approach to Severi degrees

Authors: Yaim Cooper, Rahul Pandharipande

Abstract: The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through points in a surface. We express the Severi degrees of CP1 x CP1 as matrix elements of the exponential of a single operator M on Fock space. The formalism puts Severi degrees on a similar footing as the more developed study of Hurwitz numbers of coverings of curves. The pure genus 1 invariants… ▽ More The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through points in a surface. We express the Severi degrees of CP1 x CP1 as matrix elements of the exponential of a single operator M on Fock space. The formalism puts Severi degrees on a similar footing as the more developed study of Hurwitz numbers of coverings of curves. The pure genus 1 invariants of the product E x CP1 (with E an elliptic curve) are solved via an exact formula for the eigenvalues of M to initial order. The Severi degrees of CP2 are also determined by M via the (-1)^(d-1)/d^2 disk multiple cover formula for Calabi-Yau 3-fold geometries. △ Less

Submitted 17 November, 2016; v1 submitted 30 October, 2012; originally announced October 2012.

Comments: 20 pages, 6 figures. Revised in response to referee comments. To appear in Proceedings of the London Mathematical Society

arXiv:1201.6350 [pdf, ps, other]

Mirror Symmetry for Stable Quotients Invariants

Authors: Yaim Cooper, Aleksey Zinger

Abstract: The moduli space of stable quotients introduced by Marian-Oprea-Pandharipande provides a natural compactification of the space of morphisms from nonsingular curves to a nonsingular projective variety and carries a natural virtual class. We show that the analogue of Givental's J-function for the resulting twisted projective invariants is described by the same mirror hypergeometric series as the cor… ▽ More The moduli space of stable quotients introduced by Marian-Oprea-Pandharipande provides a natural compactification of the space of morphisms from nonsingular curves to a nonsingular projective variety and carries a natural virtual class. We show that the analogue of Givental's J-function for the resulting twisted projective invariants is described by the same mirror hypergeometric series as the corresponding Gromov-Witten invariants (which arise from the moduli space of stable maps), but without the mirror transform (in the Calabi-Yau case). This implies that the stable quotients and Gromov-Witten twisted invariants agree if there is enough "positivity", but not in all cases. As a corollary of the proof, we show that certain twisted Hurwitz numbers arising in the stable quotients theory are also described by a fundamental object associated with this hypergeometric series. We thus completely answer some of the questions posed by Marian-Oprea-Pandharipande concerning their invariants. Our results suggest a deep connection between the stable quotients invariants of complete intersections and the geometry of the mirror families. As in Gromov-Witten theory, computing Givental's J-function (essentially a generating function for genus 0 invariants with 1 marked point) is key to computing stable quotients invariants of higher genus and with more marked points; we exploit this in forthcoming papers. △ Less

Submitted 10 November, 2016; v1 submitted 30 January, 2012; originally announced January 2012.

Comments: 47 pages, 6 figures; final version with some expository updates, inadvertently not uploaded in January 2014

MSC Class: 14N35; 53D45

arXiv:1109.0331 [pdf, ps, other]

The Geometry of Stable Quotients in Genus One

Authors: Yaim Cooper

Abstract: Stable quotient spaces provide an alternative to stable maps for compactifying spaces of maps. When the target is projective space and the domain curve has genus 1, these are smooth proper Deligne-Mumford stacks. In this paper we study the associated coarse moduli schemes. We show these schemes are projective, rationally connected and have Picard number 2. Then we give generators for the Picard gr… ▽ More Stable quotient spaces provide an alternative to stable maps for compactifying spaces of maps. When the target is projective space and the domain curve has genus 1, these are smooth proper Deligne-Mumford stacks. In this paper we study the associated coarse moduli schemes. We show these schemes are projective, rationally connected and have Picard number 2. Then we give generators for the Picard group, compute the canonical divisor, and the cones of ample and effective divisors. In certain cases, we also give a closed formula for the Poincaré polynomial. △ Less

Submitted 1 September, 2011; originally announced September 2011.

Comments: 36 pages

Showing 1–10 of 10 results for author: Cooper, Y