Skip to main content

Showing 1–32 of 32 results for author: Petersen, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10628  [pdf, ps, other

    stat.ML cs.LG math.PR

    Minimax learning rates for estimating binary classifiers under margin conditions

    Authors: Jonathan García, Philipp Petersen

    Abstract: We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. We establish upper and lower bounds for the minimax learning rate over broad function classes with bounded Kolmogorov entropy in Lebesgue norms. A key novelty of our work is the derivation of lower bounds on t… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    MSC Class: 68T05; 62C20; 41A25; 41A46

  2. arXiv:2504.14015  [pdf, other

    cs.NE cs.AI cs.LG q-bio.NC stat.ML

    Causal pieces: analysing and improving spiking neural networks piece by piece

    Authors: Dominik Dold, Philipp Christian Petersen

    Abstract: We introduce a novel concept for spiking neural networks (SNNs) derived from the idea of "linear pieces" used to analyse the expressiveness and trainability of artificial neural networks (ANNs). We prove that the input domain of SNNs decomposes into distinct causal regions where its output spike times are locally Lipschitz continuous with respect to the input spike times and network parameters. Th… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  3. arXiv:2503.10251  [pdf, other

    math.NA cs.LG stat.ML

    Numerical Error Analysis of Large Language Models

    Authors: Stanislav Budzinskiy, Wenyi Fang, Longbin Zeng, Philipp Petersen

    Abstract: Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a tra… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  4. arXiv:2503.02013  [pdf, ps, other

    cs.NE

    Sustainable AI: Mathematical Foundations of Spiking Neural Networks

    Authors: Adalbert Fono, Manjot Singh, Ernesto Araya, Philipp C. Petersen, Holger Boche, Gitta Kutyniok

    Abstract: Deep learning's success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through the lens of learning theory, focusing on expressivi… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  5. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  6. arXiv:2412.07312  [pdf, other

    cs.LG math.PR stat.ML

    High-dimensional classification problems with Barron regular boundaries under margin conditions

    Authors: Jonathan García, Philipp Petersen

    Abstract: We prove that a classifier with a Barron-regular decision boundary can be approximated with a rate of high polynomial degree by ReLU neural networks with three hidden layers when a margin condition is assumed. In particular, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate that is typically only achievable when approximating a low-dimensional… ▽ More

    Submitted 10 January, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    MSC Class: 68T05; 62C20; 41A25; 41A46

  7. arXiv:2411.05453  [pdf, ps, other

    stat.ML cs.LG

    The sampling complexity of learning invertible residual neural networks

    Authors: Yuanyuan Li, Philipp Grohs, Philipp Petersen

    Abstract: In recent work it has been shown that determining a feedforward ReLU neural network to within high uniform accuracy from point samples suffers from the curse of dimensionality in terms of the number of samples needed. As a consequence, feedforward ReLU neural networks are of limited use for applications where guaranteed high uniform accuracy is required. We consider the question of whether the s… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  8. arXiv:2409.20264  [pdf, ps, other

    math.NA cs.LG

    First Order System Least Squares Neural Networks

    Authors: Joost A. A. Opschoor, Philipp C. Petersen, Christoph Schwab

    Abstract: We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neural networks. The associated LSQ residual is a) equa… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    MSC Class: 65M60; 65N30; 65N50; 49M41; 35J46; 35L40

  9. arXiv:2409.17991  [pdf, other

    cs.LG math.NA stat.ML

    Dimension-independent learning rates for high-dimensional classification problems

    Authors: Andres Felipe Lerma-Pineda, Philipp Petersen, Simon Frieder, Thomas Lukasiewicz

    Abstract: We study the problem of approximating and estimating classification functions that have their decision boundary in the $RBV^2$ space. Functions of $RBV^2$ type arise naturally as solutions of regularized neural network learning problems and neural networks can approximate these functions without the curse of dimensionality. We modify existing results to show that every $RBV^2$ function can be appr… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    MSC Class: 68T05; 62C20; 41A25; 41A46

  10. arXiv:2407.18384  [pdf, other

    cs.LG math.HO

    Mathematical theory of deep learning

    Authors: Philipp Petersen, Jakob Zech

    Abstract: This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on… ▽ More

    Submitted 7 April, 2025; v1 submitted 25 July, 2024; originally announced July 2024.

  11. arXiv:2404.14875  [pdf, other

    cs.LG math.OC

    Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks

    Authors: Adeyemi D. Adeoye, Philipp Christian Petersen, Alberto Bemporad

    Abstract: The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent ke… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 27 pages, 9 figures, 2 tables

  12. arXiv:2404.04549  [pdf, ps, other

    cs.NE cs.LG math.FA stat.ML

    Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

    Authors: A. Martina Neuman, Dominik Dold, Philipp Christian Petersen

    Abstract: We study the learning problem associated with spiking neural networks. Specifically, we focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights, equipped with an affine encoder and decoder; we refer to these as affine spiking neural networks. These neural networks are shown to depend continuously on their parameters, which facilitates classical cov… ▽ More

    Submitted 20 June, 2025; v1 submitted 6 April, 2024; originally announced April 2024.

  13. arXiv:2312.04556  [pdf, other

    cs.CL cs.AI cs.LG math.HO

    Large Language Models for Mathematicians

    Authors: Simon Frieder, Julius Berner, Philipp Petersen, Thomas Lukasiewicz

    Abstract: Large language models (LLMs) such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We f… ▽ More

    Submitted 2 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Journal ref: International Mathematical News 254 (2023) 1-20

  14. arXiv:2304.10247  [pdf, other

    cs.RO eess.IV

    Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain

    Authors: Philipp Rigoll, Patrick Petersen, Hanno Stage, Lennart Ries, Eric Sax

    Abstract: Handling large amounts of data has become a key for developing automated driving systems. Especially for developing highly automated driving functions, working with images has become increasingly challenging due to the sheer size of the required data. Such data has to satisfy different requirements to be usable in machine learning-based approaches. Thus, engineers need to fully understand their la… ▽ More

    Submitted 21 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  15. arXiv:2301.13867  [pdf, other

    cs.LG cs.AI cs.CL

    Mathematical Capabilities of ChatGPT

    Authors: Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius Berner

    Abstract: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-languag… ▽ More

    Submitted 20 July, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Added further evaluations on another ChatGPT version and on GPT-4. The GHOSTS and miniGHOSTS datasets are available at https://github.com/xyfrieder/science-GHOSTS

    Journal ref: NeurIPS 2023 Datasets and Benchmarks

  16. arXiv:2212.09507  [pdf, ps, other

    cs.LG math.FA stat.ML

    VC dimensions of group convolutional neural networks

    Authors: Philipp Christian Petersen, Anna Sepliarskaia

    Abstract: We study the generalization capacity of group convolutional neural networks. We identify precise estimates for the VC dimensions of simple sets of group convolutional neural networks. In particular, we find that for infinite groups and appropriately chosen convolutional kernels, already two-parameter families of convolutional neural networks have an infinite VC dimension, despite being invariant t… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    MSC Class: 68T07; 68Q32; 68T05

  17. arXiv:2210.00805  [pdf, other

    cs.LG math.FA stat.ML

    Limitations of neural network training due to numerical instability of backpropagation

    Authors: Clemens Karner, Vladimir Kazeev, Philipp Christian Petersen

    Abstract: We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtua… ▽ More

    Submitted 15 November, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    MSC Class: 65G50; 68T07; 41A25; 68T09

  18. arXiv:2112.12555  [pdf, ps, other

    math.FA cs.LG stat.ML

    Optimal learning of high-dimensional classification problems using deep neural networks

    Authors: Philipp Petersen, Felix Voigtlaender

    Abstract: We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essent… ▽ More

    Submitted 24 December, 2021; v1 submitted 23 December, 2021; originally announced December 2021.

    MSC Class: 68T05; 62C20; 41A25; 41A46

  19. arXiv:2108.05732  [pdf, other

    cs.LG cs.CV math.FA math.NA

    Deep Microlocal Reconstruction for Limited-Angle Tomography

    Authors: Héctor Andrade-Loarca, Gitta Kutyniok, Ozan Öktem, Philipp Petersen

    Abstract: We present a deep learning-based algorithm to jointly solve a reconstruction problem and a wavefront set extraction problem in tomographic imaging. The algorithm is based on a recently developed digital wavefront set extractor as well as the well-known microlocal canonical relation for the Radon transform. We use the wavefront set information about x-ray data to improve the reconstruction by requi… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 43 pages, 8 figures

    MSC Class: 35A18; 65T60; 68T10

  20. The Modern Mathematics of Deep Learning

    Authors: Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen

    Abstract: We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surpr… ▽ More

    Submitted 8 February, 2023; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: A version of this review paper appears as a chapter in the book "Mathematical Aspects of Deep Learning" by Cambridge University Press

    Journal ref: Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022

  21. Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities

    Authors: Carlo Marcati, Joost A. A. Opschoor, Philipp C. Petersen, Christoph Schwab

    Abstract: We prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in $H^1(Ω)$ for weighted analytic function classes in certain polytopal domains $Ω$, in space dimension $d=2,3$. Functions in these classes are locally analytic on open subdomains $D\subset Ω$, but may exhibit isolated point singularities in the interior of $Ω$ or corner and edge singularities at the boundary… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Found Comput Math (2022)

    MSC Class: 35Q40; 41A25; 41A46; 65N30

    Journal ref: Found. Comput. Math.23(2023), no.3, 1043-1127

  22. arXiv:2006.05265  [pdf, other

    cs.LG cs.SE stat.ML

    MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

    Authors: Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

    Abstract: Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses… ▽ More

    Submitted 2 June, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: arXiv admin note: text overlap with arXiv:2003.11118

  23. arXiv:2004.12131  [pdf, other

    math.NA cs.LG stat.ML

    Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks

    Authors: Moritz Geist, Philipp Petersen, Mones Raslan, Reinhold Schneider, Gitta Kutyniok

    Abstract: We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    MSC Class: 35J99; 41A25; 41A30; 68T05; 65N30

  24. arXiv:2003.11118  [pdf, ps, other

    cs.PL cs.AI

    Context-Aware Parse Trees

    Authors: Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Paul Petersen, Jesmin Jahan Tithi, Tim Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

    Abstract: The simplified parse tree (SPT) presented in Aroma, a state-of-the-art code recommendation system, is a tree-structured representation used to infer code semantics by capturing program \emph{structure} rather than program \emph{syntax}. This is a departure from the classical abstract syntax tree, which is principally driven by programming language syntax. While we believe a semantics-driven repres… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

  25. arXiv:1904.04789  [pdf, ps, other

    math.FA cs.LG

    Approximation in $L^p(μ)$ with deep ReLU neural networks

    Authors: Felix Voigtlaender, Philipp Petersen

    Abstract: We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases with the approximation accuracy. After reviewing… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: Accepted for presentation at SampTA 2019

    MSC Class: 41A25; 82C32; 41A46

  26. arXiv:1904.00377  [pdf, ps, other

    math.NA cs.LG math.FA stat.ML

    A Theoretical Analysis of Deep Neural Networks and Parametric PDEs

    Authors: Gitta Kutyniok, Philipp Petersen, Mones Raslan, Reinhold Schneider

    Abstract: We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. C… ▽ More

    Submitted 14 May, 2020; v1 submitted 31 March, 2019; originally announced April 2019.

    MSC Class: 35A35; 35J99; 41A25; 41A46; 68T05; 65N30

  27. arXiv:1902.07896  [pdf, ps, other

    math.FA cs.LG

    Error bounds for approximations with deep ReLU neural networks in $W^{s,p}$ norms

    Authors: Ingo Gühring, Gitta Kutyniok, Philipp Petersen

    Abstract: We analyze approximation rates of deep ReLU neural networks for Sobolev-regular functions with respect to weaker Sobolev norms. First, we construct, based on a calculus of ReLU networks, artificial neural networks with ReLU activation functions that achieve certain approximation rates. Second, we establish lower bounds for the approximation by ReLU neural networks for classes of Sobolev-regular fu… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  28. arXiv:1901.05744  [pdf, ps, other

    cs.LG stat.ML

    The Oracle of DLphi

    Authors: Dominik Alfke, Weston Baines, Jan Blechschmidt, Mauricio J. del Razo Sarmina, Amnon Drory, Dennis Elbrächter, Nando Farchmin, Matteo Gambara, Silke Glas, Philipp Grohs, Peter Hinz, Danijel Kivaranovic, Christian Kümmerle, Gitta Kutyniok, Sebastian Lunz, Jan Macdonald, Ryan Malthaner, Gregory Naisat, Ariel Neufeld, Philipp Christian Petersen, Rafael Reisenhofer, Jun-Da Sheng, Laura Thesing, Philipp Trunschke, Johannes von Lindheim , et al. (2 additional authors not shown)

    Abstract: We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t… ▽ More

    Submitted 27 January, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

    MSC Class: 68T05; 82C32

  29. arXiv:1901.01388  [pdf, other

    eess.IV cs.LG eess.SP stat.ML

    Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks

    Authors: Héctor Andrade-Loarca, Gitta Kutyniok, Ozan Öktem, Philipp Petersen

    Abstract: Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based method… ▽ More

    Submitted 10 July, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

    MSC Class: 35A18; 65T60; 68T10

  30. arXiv:1809.00973  [pdf, other

    math.FA cs.LG

    Equivalence of approximation by convolutional neural networks and fully-connected networks

    Authors: Philipp Petersen, Felix Voigtlaender

    Abstract: Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions… ▽ More

    Submitted 28 January, 2021; v1 submitted 4 September, 2018; originally announced September 2018.

    MSC Class: 41A25; 44A35; 41A46

    Journal ref: Proc. Amer. Math. Soc. 148 (2020), 1567-1581

  31. arXiv:1709.05289  [pdf, ps, other

    math.FA cs.LG stat.ML

    Optimal approximation of piecewise smooth functions using deep ReLU neural networks

    Authors: Philipp Petersen, Felix Voigtlaender

    Abstract: We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^β(\mathbb R^d)$ of possibly discontinuous piecewise $C^β$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by… ▽ More

    Submitted 22 May, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

    Comments: Generalized some estimates to $L^p$ norms for $0<p<\infty$

    MSC Class: 41A25; 41A10; 82C32; 41A46; 68T05

  32. arXiv:1705.01714  [pdf, other

    cs.LG cs.IT math.FA

    Optimal Approximation with Sparsely Connected Deep Neural Networks

    Authors: Helmut Bölcskei, Philipp Grohs, Gitta Kutyniok, Philipp Petersen

    Abstract: We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accurac… ▽ More

    Submitted 16 May, 2018; v1 submitted 4 May, 2017; originally announced May 2017.

    MSC Class: 41A25; 82C32; 42C40; 42C15; 41A46; 68T05; 94A34; 94A12