-
Minimax learning rates for estimating binary classifiers under margin conditions
Authors:
Jonathan García,
Philipp Petersen
Abstract:
We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. We establish upper and lower bounds for the minimax learning rate over broad function classes with bounded Kolmogorov entropy in Lebesgue norms. A key novelty of our work is the derivation of lower bounds on t…
▽ More
We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. We establish upper and lower bounds for the minimax learning rate over broad function classes with bounded Kolmogorov entropy in Lebesgue norms. A key novelty of our work is the derivation of lower bounds on the worst-case learning rates under a geometric margin condition -- a setting that is almost universally satisfied in practice but remains theoretically challenging. Moreover, our results deal with the noiseless setting, where lower bounds are particularly hard to establish. We apply our general results to classification problems with decision boundaries belonging to several function classes: for Barron-regular functions, and for Hölder-continuous functions with strong margins, we identify optimal rates close to the fast learning rates of $\mathcal{O}(n^{-1})$ for $n \in \mathbb{N}$ samples. Also for merely convex decision boundaries, in a strong margin case optimal rates near $\mathcal{O}(n^{-1/2})$ can be achieved.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Causal pieces: analysing and improving spiking neural networks piece by piece
Authors:
Dominik Dold,
Philipp Christian Petersen
Abstract:
We introduce a novel concept for spiking neural networks (SNNs) derived from the idea of "linear pieces" used to analyse the expressiveness and trainability of artificial neural networks (ANNs). We prove that the input domain of SNNs decomposes into distinct causal regions where its output spike times are locally Lipschitz continuous with respect to the input spike times and network parameters. Th…
▽ More
We introduce a novel concept for spiking neural networks (SNNs) derived from the idea of "linear pieces" used to analyse the expressiveness and trainability of artificial neural networks (ANNs). We prove that the input domain of SNNs decomposes into distinct causal regions where its output spike times are locally Lipschitz continuous with respect to the input spike times and network parameters. The number of such regions - which we call "causal pieces" - is a measure of the approximation capabilities of SNNs. In particular, we demonstrate in simulation that parameter initialisations which yield a high number of causal pieces on the training set strongly correlate with SNN training success. Moreover, we find that feedforward SNNs with purely positive weights exhibit a surprisingly high number of causal pieces, allowing them to achieve competitive performance levels on benchmark tasks. We believe that causal pieces are not only a powerful and principled tool for improving SNNs, but might also open up new ways of comparing SNNs and ANNs in the future.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Numerical Error Analysis of Large Language Models
Authors:
Stanislav Budzinskiy,
Wenyi Fang,
Longbin Zeng,
Philipp Petersen
Abstract:
Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a tra…
▽ More
Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Sustainable AI: Mathematical Foundations of Spiking Neural Networks
Authors:
Adalbert Fono,
Manjot Singh,
Ernesto Araya,
Philipp C. Petersen,
Holger Boche,
Gitta Kutyniok
Abstract:
Deep learning's success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through the lens of learning theory, focusing on expressivi…
▽ More
Deep learning's success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through the lens of learning theory, focusing on expressivity, training, and generalization, as well as energy-efficient implementations while comparing them to artificial neural networks. By categorizing spiking models based on time representation and information encoding, we highlight their strengths, challenges, and potential as an alternative computational paradigm.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
High-dimensional classification problems with Barron regular boundaries under margin conditions
Authors:
Jonathan García,
Philipp Petersen
Abstract:
We prove that a classifier with a Barron-regular decision boundary can be approximated with a rate of high polynomial degree by ReLU neural networks with three hidden layers when a margin condition is assumed. In particular, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate that is typically only achievable when approximating a low-dimensional…
▽ More
We prove that a classifier with a Barron-regular decision boundary can be approximated with a rate of high polynomial degree by ReLU neural networks with three hidden layers when a margin condition is assumed. In particular, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate that is typically only achievable when approximating a low-dimensional smooth function. We demonstrate how these expression rate bounds imply fast-rate learning bounds that are close to $n^{-1}$ where $n$ is the number of samples. In addition, we carry out comprehensive numerical experimentation on binary classification problems with various margins. We study three different dimensions, with the highest dimensional problem corresponding to images from the MNIST data set.
△ Less
Submitted 10 January, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
The sampling complexity of learning invertible residual neural networks
Authors:
Yuanyuan Li,
Philipp Grohs,
Philipp Petersen
Abstract:
In recent work it has been shown that determining a feedforward ReLU neural network to within high uniform accuracy from point samples suffers from the curse of dimensionality in terms of the number of samples needed. As a consequence, feedforward ReLU neural networks are of limited use for applications where guaranteed high uniform accuracy is required.
We consider the question of whether the s…
▽ More
In recent work it has been shown that determining a feedforward ReLU neural network to within high uniform accuracy from point samples suffers from the curse of dimensionality in terms of the number of samples needed. As a consequence, feedforward ReLU neural networks are of limited use for applications where guaranteed high uniform accuracy is required.
We consider the question of whether the sampling complexity can be improved by restricting the specific neural network architecture. To this end, we investigate invertible residual neural networks which are foundational architectures in deep learning and are widely employed in models that power modern generative methods. Our main result shows that the residual neural network architecture and invertibility do not help overcome the complexity barriers encountered with simpler feedforward architectures. Specifically, we demonstrate that the computational complexity of approximating invertible residual neural networks from point samples in the uniform norm suffers from the curse of dimensionality. Similar results are established for invertible convolutional Residual neural networks.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
First Order System Least Squares Neural Networks
Authors:
Joost A. A. Opschoor,
Philipp C. Petersen,
Christoph Schwab
Abstract:
We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neural networks. The associated LSQ residual is a) equa…
▽ More
We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neural networks. The associated LSQ residual is a) equal or proportional to a weak residual of the PDE, b) additive in terms of contributions from localized subnetworks, indicating locally ``out-of-equilibrium'' of neural networks with respect to the PDE residual, c) serves as numerical loss function for neural network training, and d) constitutes, even with incomplete training, a computable, (quasi-)optimal numerical error estimator in the context of adaptive LSQ finite element methods. In addition, an adaptive neural network growth strategy is proposed which, assuming exact numerical minimization of the LSQ loss functional, yields sequences of neural networks with realizations that converge rate-optimally to the exact solution of the first order system LSQ formulation.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Dimension-independent learning rates for high-dimensional classification problems
Authors:
Andres Felipe Lerma-Pineda,
Philipp Petersen,
Simon Frieder,
Thomas Lukasiewicz
Abstract:
We study the problem of approximating and estimating classification functions that have their decision boundary in the $RBV^2$ space. Functions of $RBV^2$ type arise naturally as solutions of regularized neural network learning problems and neural networks can approximate these functions without the curse of dimensionality. We modify existing results to show that every $RBV^2$ function can be appr…
▽ More
We study the problem of approximating and estimating classification functions that have their decision boundary in the $RBV^2$ space. Functions of $RBV^2$ type arise naturally as solutions of regularized neural network learning problems and neural networks can approximate these functions without the curse of dimensionality. We modify existing results to show that every $RBV^2$ function can be approximated by a neural network with bounded weights. Thereafter, we prove the existence of a neural network with bounded weights approximating a classification function. And we leverage these bounds to quantify the estimation rates. Finally, we present a numerical study that analyzes the effect of different regularity conditions on the decision boundaries.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Mathematical theory of deep learning
Authors:
Philipp Petersen,
Jakob Zech
Abstract:
This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on…
▽ More
This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning.
△ Less
Submitted 7 April, 2025; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks
Authors:
Adeyemi D. Adeoye,
Philipp Christian Petersen,
Alberto Bemporad
Abstract:
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent ke…
▽ More
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders
Authors:
A. Martina Neuman,
Dominik Dold,
Philipp Christian Petersen
Abstract:
We study the learning problem associated with spiking neural networks. Specifically, we focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights, equipped with an affine encoder and decoder; we refer to these as affine spiking neural networks. These neural networks are shown to depend continuously on their parameters, which facilitates classical cov…
▽ More
We study the learning problem associated with spiking neural networks. Specifically, we focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights, equipped with an affine encoder and decoder; we refer to these as affine spiking neural networks. These neural networks are shown to depend continuously on their parameters, which facilitates classical covering number-based generalization statements and supports stable gradient-based training. We demonstrate that the positivity of the weights enables a wide range of expressivity results, including rate-optimal approximation of smooth functions and dimension-independent approximation of Barron regular functions. In particular, we show in theory and simulations that affine spiking neural networks are capable of approximating shallow ReLU neural networks. Furthermore, we apply these affine spiking neural networks to standard machine learning benchmarks and reach competitive results. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
△ Less
Submitted 20 June, 2025; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Large Language Models for Mathematicians
Authors:
Simon Frieder,
Julius Berner,
Philipp Petersen,
Thomas Lukasiewicz
Abstract:
Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We f…
▽ More
Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We first provide a mathematical description of the transformer model used in all modern language models. Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of language models. Finally, we shed light on the potential of LLMs to change how mathematicians work.
△ Less
Submitted 2 April, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain
Authors:
Philipp Rigoll,
Patrick Petersen,
Hanno Stage,
Lennart Ries,
Eric Sax
Abstract:
Handling large amounts of data has become a key for developing automated driving systems. Especially for developing highly automated driving functions, working with images has become increasingly challenging due to the sheer size of the required data. Such data has to satisfy different requirements to be usable in machine learning-based approaches. Thus, engineers need to fully understand their la…
▽ More
Handling large amounts of data has become a key for developing automated driving systems. Especially for developing highly automated driving functions, working with images has become increasingly challenging due to the sheer size of the required data. Such data has to satisfy different requirements to be usable in machine learning-based approaches. Thus, engineers need to fully understand their large image data sets for the development and test of machine learning algorithms. However, current approaches lack automatability, are not generic and are limited in their expressiveness. Hence, this paper aims to analyze a state-of-the-art text and image embedding neural network and guides through the application in the automotive domain. This approach enables the search for similar images and the search based on a human understandable text-based description. Our experiments show the automatability and generalizability of our proposed method for handling large data sets in the automotive domain.
△ Less
Submitted 21 April, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Mathematical Capabilities of ChatGPT
Authors:
Simon Frieder,
Luca Pinchetti,
Alexis Chevalier,
Ryan-Rhys Griffiths,
Tommaso Salvatori,
Thomas Lukasiewicz,
Philipp Christian Petersen,
Julius Berner
Abstract:
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-languag…
▽ More
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!
△ Less
Submitted 20 July, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
VC dimensions of group convolutional neural networks
Authors:
Philipp Christian Petersen,
Anna Sepliarskaia
Abstract:
We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant t…
▽ More
We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant to the action of an infinite group.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Limitations of neural network training due to numerical instability of backpropagation
Authors:
Clemens Karner,
Vladimir Kazeev,
Philipp Christian Petersen
Abstract:
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtua…
▽ More
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments that yield high-order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.
△ Less
Submitted 15 November, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Optimal learning of high-dimensional classification problems using deep neural networks
Authors:
Philipp Petersen,
Felix Voigtlaender
Abstract:
We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essent…
▽ More
We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions.
△ Less
Submitted 24 December, 2021; v1 submitted 23 December, 2021;
originally announced December 2021.
-
Deep Microlocal Reconstruction for Limited-Angle Tomography
Authors:
Héctor Andrade-Loarca,
Gitta Kutyniok,
Ozan Öktem,
Philipp Petersen
Abstract:
We present a deep learning-based algorithm to jointly solve a reconstruction problem and a wavefront set extraction problem in tomographic imaging. The algorithm is based on a recently developed digital wavefront set extractor as well as the well-known microlocal canonical relation for the Radon transform. We use the wavefront set information about x-ray data to improve the reconstruction by requi…
▽ More
We present a deep learning-based algorithm to jointly solve a reconstruction problem and a wavefront set extraction problem in tomographic imaging. The algorithm is based on a recently developed digital wavefront set extractor as well as the well-known microlocal canonical relation for the Radon transform. We use the wavefront set information about x-ray data to improve the reconstruction by requiring that the underlying neural networks simultaneously extract the correct ground truth wavefront set and ground truth image. As a necessary theoretical step, we identify the digital microlocal canonical relations for deep convolutional residual neural networks. We find strong numerical evidence for the effectiveness of this approach.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
The Modern Mathematics of Deep Learning
Authors:
Julius Berner,
Philipp Grohs,
Gitta Kutyniok,
Philipp Petersen
Abstract:
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surpr…
▽ More
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.
△ Less
Submitted 8 February, 2023; v1 submitted 9 May, 2021;
originally announced May 2021.
-
Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities
Authors:
Carlo Marcati,
Joost A. A. Opschoor,
Philipp C. Petersen,
Christoph Schwab
Abstract:
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary…
▽ More
We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary $\partial Ω$. The exponential expression rate bounds proved here imply uniform exponential expressivity by ReLU NNs of solution families for several elliptic boundary and eigenvalue problems with analytic data. The exponential approximation rates are shown to hold in space dimension $d = 2$ on Lipschitz polygons with straight sides, and in space dimension $d=3$ on Fichera-type polyhedral domains with plane faces. The constructive proofs indicate in particular that NN depth and size increase poly-logarithmically with respect to the target NN approximation accuracy $\varepsilon>0$ in $H^1(Ω)$. The results cover in particular solution sets of linear, second order elliptic PDEs with analytic data and certain nonlinear elliptic eigenvalue problems with analytic nonlinearities and singular, weighted analytic potentials as arise in electron structure models. In the latter case, the functions correspond to electron densities that exhibit isolated point singularities at the positions of the nuclei. Our findings provide in particular mathematical foundation of recently reported, successful uses of deep neural networks in variational electron structure algorithms.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
Authors:
Fangke Ye,
Shengtian Zhou,
Anand Venkat,
Ryan Marcus,
Nesime Tatbul,
Jesmin Jahan Tithi,
Niranjan Hasabnis,
Paul Petersen,
Timothy Mattson,
Tim Kraska,
Pradeep Dubey,
Vivek Sarkar,
Justin Gottschlich
Abstract:
Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses…
▽ More
Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses a novel context-aware semantics structure, which was purpose-built to lift semantics from code syntax; (ii)MISIM uses an extensible neural code similarity scoring algorithm, which can be used for various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems, including two additional hand-customized models, over 328K programs consisting of over 18 million lines of code. Our experiments show that MISIM has 8.08% better accuracy (using MAP@R) compared to the next best performing system.
△ Less
Submitted 2 June, 2021; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks
Authors:
Moritz Geist,
Philipp Petersen,
Mones Raslan,
Reinhold Schneider,
Gitta Kutyniok
Abstract:
We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly…
▽ More
We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly on the dimension of the parameter space and is determined by the intrinsic dimension of the solution manifold of the parametric partial differential equation. We use various methods to establish comparability between test-cases by minimizing the effect of the choice of test-cases on the optimization and sampling aspects of the learning problem. We find strong support for the hypothesis that approximation-theoretical effects heavily influence the practical behavior of learning problems in numerical analysis.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Context-Aware Parse Trees
Authors:
Fangke Ye,
Shengtian Zhou,
Anand Venkat,
Ryan Marcus,
Paul Petersen,
Jesmin Jahan Tithi,
Tim Mattson,
Tim Kraska,
Pradeep Dubey,
Vivek Sarkar,
Justin Gottschlich
Abstract:
The simplified parse tree (SPT) presented in Aroma, a state-of-the-art code recommendation system, is a tree-structured representation used to infer code semantics by capturing program \emph{structure} rather than program \emph{syntax}. This is a departure from the classical abstract syntax tree, which is principally driven by programming language syntax. While we believe a semantics-driven repres…
▽ More
The simplified parse tree (SPT) presented in Aroma, a state-of-the-art code recommendation system, is a tree-structured representation used to infer code semantics by capturing program \emph{structure} rather than program \emph{syntax}. This is a departure from the classical abstract syntax tree, which is principally driven by programming language syntax. While we believe a semantics-driven representation is desirable, the specifics of an SPT's construction can impact its performance. We analyze these nuances and present a new tree structure, heavily influenced by Aroma's SPT, called a \emph{context-aware parse tree} (CAPT). CAPT enhances SPT by providing a richer level of semantic representation. Specifically, CAPT provides additional binding support for language-specific techniques for adding semantically-salient features, and language-agnostic techniques for removing syntactically-present but semantically-irrelevant features. Our research quantitatively demonstrates the value of our proposed semantically-salient features, enabling a specific CAPT configuration to be 39\% more accurate than SPT across the 48,610 programs we analyzed.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Approximation in $L^p(μ)$ with deep ReLU neural networks
Authors:
Felix Voigtlaender,
Philipp Petersen
Abstract:
We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases with the approximation accuracy. After reviewing…
▽ More
We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases with the approximation accuracy. After reviewing these findings, we show that the results concerning networks with fixed depth--- which up to now only consider approximation in $L^p(λ)$ for the Lebesgue measure $λ$--- can be generalized to approximation in $L^p(μ)$, for any finite Borel measure $μ$. In particular, the generalized results apply in the usual setting of statistical learning theory, where one is interested in approximation in $L^2(\mathbb{P})$, with the probability measure $\mathbb{P}$ describing the distribution of the data.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
A Theoretical Analysis of Deep Neural Networks and Parametric PDEs
Authors:
Gitta Kutyniok,
Philipp Petersen,
Mones Raslan,
Reinhold Schneider
Abstract:
We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. C…
▽ More
We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. Concretely, we use the existence of a small reduced basis to construct, for a large variety of parametric partial differential equations, neural networks that yield approximations of the parametric solution maps in such a way that the sizes of these networks essentially only depend on the size of the reduced basis.
△ Less
Submitted 14 May, 2020; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Error bounds for approximations with deep ReLU neural networks in $W^{s,p}$ norms
Authors:
Ingo Gühring,
Gitta Kutyniok,
Philipp Petersen
Abstract:
We analyze approximation rates of deep ReLU neural networks for Sobolev-regular functions with respect to weaker Sobolev norms. First, we construct, based on a calculus of ReLU networks, artificial neural networks with ReLU activation functions that achieve certain approximation rates. Second, we establish lower bounds for the approximation by ReLU neural networks for classes of Sobolev-regular fu…
▽ More
We analyze approximation rates of deep ReLU neural networks for Sobolev-regular functions with respect to weaker Sobolev norms. First, we construct, based on a calculus of ReLU networks, artificial neural networks with ReLU activation functions that achieve certain approximation rates. Second, we establish lower bounds for the approximation by ReLU neural networks for classes of Sobolev-regular functions. Our results extend recent advances in the approximation theory of ReLU networks to the regime that is most relevant for applications in the numerical analysis of partial differential equations.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
The Oracle of DLphi
Authors:
Dominik Alfke,
Weston Baines,
Jan Blechschmidt,
Mauricio J. del Razo Sarmina,
Amnon Drory,
Dennis Elbrächter,
Nando Farchmin,
Matteo Gambara,
Silke Glas,
Philipp Grohs,
Peter Hinz,
Danijel Kivaranovic,
Christian Kümmerle,
Gitta Kutyniok,
Sebastian Lunz,
Jan Macdonald,
Ryan Malthaner,
Gregory Naisat,
Ariel Neufeld,
Philipp Christian Petersen,
Rafael Reisenhofer,
Jun-Da Sheng,
Laura Thesing,
Philipp Trunschke,
Johannes von Lindheim
, et al. (2 additional authors not shown)
Abstract:
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t…
▽ More
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant.
△ Less
Submitted 27 January, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks
Authors:
Héctor Andrade-Loarca,
Gitta Kutyniok,
Ozan Öktem,
Philipp Petersen
Abstract:
Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based method…
▽ More
Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based methods. Based on a celebrated property of the shearlet transform to unravel information on the wavefront set, we extract the wavefront set of an image by first applying a discrete shearlet transform and then feeding local patches of this transform to a deep convolutional neural network trained on labeled data. The resulting algorithm outperforms all competing algorithms in edge-orientation and ramp-orientation detection.
△ Less
Submitted 10 July, 2019; v1 submitted 5 January, 2019;
originally announced January 2019.
-
Equivalence of approximation by convolutional neural networks and fully-connected networks
Authors:
Philipp Petersen,
Felix Voigtlaender
Abstract:
Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions…
▽ More
Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions $f \in \mathcal{C}$ -- for an arbitrary function class $\mathcal{C}$ -- translate to essentially the same bounds concerning approximation rates of convolutional neural networks for functions $f \in {\mathcal{C}^{equi}}$, with the class ${\mathcal{C}^{equi}}$ consisting of all translation equivariant functions whose first coordinate belongs to $\mathcal{C}$. All presented results consider exclusively the case of convolutional neural networks without any pooling operation and with circular convolutions, i.e., not based on zero-padding.
△ Less
Submitted 28 January, 2021; v1 submitted 4 September, 2018;
originally announced September 2018.
-
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Authors:
Philipp Petersen,
Felix Voigtlaender
Abstract:
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^β(\mathbb R^d)$ of possibly discontinuous piecewise $C^β$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by…
▽ More
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^β(\mathbb R^d)$ of possibly discontinuous piecewise $C^β$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by $C^β$ hypersurfaces. For dimension $d \geq 2$, regularity $β> 0$, and accuracy $\varepsilon > 0$, we construct artificial neural networks with ReLU activation function that approximate functions from $\mathcal{E}^β(\mathbb R^d)$ up to $L^2$ error of $\varepsilon$. The constructed networks have a fixed number of layers, depending only on $d$ and $β$, and they have $O(\varepsilon^{-2(d-1)/β})$ many nonzero weights, which we prove to be optimal. In addition to the optimality in terms of the number of weights, we show that in order to achieve the optimal approximation rate, one needs ReLU networks of a certain depth. Precisely, for piecewise $C^β(\mathbb R^d)$ functions, this minimal depth is given---up to a multiplicative constant---by $β/d$. Up to a log factor, our constructed networks match this bound. This partly explains the benefits of depth for ReLU networks by showing that deep networks are necessary to achieve efficient approximation of (piecewise) smooth functions. Finally, we analyze approximation in high-dimensional spaces where the function $f$ to be approximated can be factorized into a smooth dimension reducing feature map $τ$ and classifier function $g$---defined on a low-dimensional feature space---as $f = g \circ τ$. We show that in this case the approximation rate depends only on the dimension of the feature space and not the input dimension.
△ Less
Submitted 22 May, 2018; v1 submitted 15 September, 2017;
originally announced September 2017.
-
Optimal Approximation with Sparsely Connected Deep Neural Networks
Authors:
Helmut Bölcskei,
Philipp Grohs,
Gitta Kutyniok,
Philipp Petersen
Abstract:
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accurac…
▽ More
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems---so-called \emph{affine systems}---can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $α$-shearlets, and more generally $α$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $α^{-1}$-cartoon-like functions, which is approximated optimally by $α$-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can actually learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on.
△ Less
Submitted 16 May, 2018; v1 submitted 4 May, 2017;
originally announced May 2017.