Search | arXiv e-print repository

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

Authors: Vit Fojtik, Maria Matveev, Hung-Hsu Chou, Gitta Kutyniok, Johannes Maly

Abstract: A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal vario… ▽ More A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal various forms of implicit regularization, such as $\ell_1$-norm minimizing parameters in regression and max-margin solutions in classification. Concurrently, empirical findings show that moderate to large learning rates exceeding standard stability thresholds lead to faster, albeit oscillatory, convergence in the so-called Edge-of-Stability regime, and induce an implicit bias towards minima of low sharpness (norm of training loss Hessian). In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate balances between low parameter norm and low sharpness of the trained model. We furthermore prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2504.18433 [pdf, other]

An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression

Authors: Christopher Bülte, Yusuf Sale, Timo Löhr, Paul Hofman, Gitta Kutyniok, Eyke Hüllermeier

Abstract: Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictiv… ▽ More Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictive exponential family, we can generalize commonly used approaches for uncertainty representation and corresponding uncertainty measures. More specifically, we analyze the widely used entropy- and variance-based measures regarding limitations and challenges. Our findings provide a principled foundation for uncertainty quantification in regression, offering theoretical insights and practical guidelines for reliable uncertainty assessment. △ Less

Submitted 16 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

arXiv:2307.02301 [pdf, other]

Sumformer: Universal Approximation for Efficient Transformers

Authors: Silas Alberti, Niclas Dern, Laura Thesing, Gitta Kutyniok

Abstract: Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handl… ▽ More Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2205.15117 [pdf, other]

OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs

Authors: Yangze Zhou, Gitta Kutyniok, Bruno Ribeiro

Abstract: This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (struct… ▽ More This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results. △ Less

Submitted 9 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2203.08890 [pdf, other]

The Mathematics of Artificial Intelligence

Authors: Gitta Kutyniok

Abstract: We currently witness the spectacular success of artificial intelligence in both science and public life. However, the development of a rigorous mathematical foundation is still at an early stage. In this survey article, which is based on an invited lecture at the International Congress of Mathematicians 2022, we will in particular focus on the current "workhorse" of artificial intelligence, namely… ▽ More We currently witness the spectacular success of artificial intelligence in both science and public life. However, the development of a rigorous mathematical foundation is still at an early stage. In this survey article, which is based on an invited lecture at the International Congress of Mathematicians 2022, we will in particular focus on the current "workhorse" of artificial intelligence, namely deep neural networks. We will present the main theoretical directions along with several exemplary results and discuss key open problems. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: 16 pages, 7 figures

MSC Class: Primary 68T07; Secondary 41A25; 42C15; 35C20; 65D18

arXiv:2105.04026 [pdf, other]

doi 10.1017/9781009025096.002

The Modern Mathematics of Deep Learning

Authors: Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen

Abstract: We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surpr… ▽ More We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail. △ Less

Submitted 8 February, 2023; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: A version of this review paper appears as a chapter in the book "Mathematical Aspects of Deep Learning" by Cambridge University Press

Journal ref: Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022

arXiv:2012.04477 [pdf, ps, other]

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Authors: Mariia Seleznova, Gitta Kutyniok

Abstract: Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behavior of real finite-width ones? In this paper, we study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs. We find out that whether a networ… ▽ More Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behavior of real finite-width ones? In this paper, we study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs. We find out that whether a network is in the NTK regime depends on the hyperparameters of random initialization and the network's depth. In particular, NTK theory does not explain the behavior of sufficiently deep networks initialized so that their gradients explode as they propagate through the network's layers: the kernel is random at initialization and changes significantly during training in this case, contrary to NTK theory. On the other hand, in the case of vanishing gradients, DNNs are in the the NTK regime but become untrainable rapidly with depth. We also describe a framework to study generalization properties of DNNs, in particular the variance of network's output function, by means of NTK theory and discuss its limits. △ Less

Submitted 1 February, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

Journal ref: Proceedings of Machine Learning Research vol 145:1-28, 2021 2nd Annual Conference on Mathematical and Scientific Machine Learning

arXiv:2007.04759 [pdf, other]

Expressivity of Deep Neural Networks

Authors: Ingo Gühring, Mones Raslan, Gitta Kutyniok

Abstract: In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. Approximation rates for classical function spaces as well as benefits of deep neural networks over shallow ones for specifically structured function classes are discussed. While the mainbody of existing results is for general feedforward architectures, we also depict approximat… ▽ More In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. Approximation rates for classical function spaces as well as benefits of deep neural networks over shallow ones for specifically structured function classes are discussed. While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: This review paper will appear as a book chapter in the book "Theory of Deep Learning" by Cambridge University Press

MSC Class: 41-02; 41-03; 68T07; 68Q32

arXiv:2007.00758 [pdf, other]

In-Distribution Interpretability for Challenging Modalities

Authors: Cosmas Heiß, Ron Levie, Cinjon Resnick, Gitta Kutyniok, Joan Bruna

Abstract: It is widely recognized that the predictions of deep neural networks are difficult to parse relative to simpler approaches. However, the development of methods to investigate the mode of operation of such models has advanced rapidly in the past few years. Recent work introduced an intuitive framework which utilizes generative models to improve on the meaningfulness of such explanations. In this wo… ▽ More It is widely recognized that the predictions of deep neural networks are difficult to parse relative to simpler approaches. However, the development of methods to investigate the mode of operation of such models has advanced rapidly in the past few years. Recent work introduced an intuitive framework which utilizes generative models to improve on the meaningfulness of such explanations. In this work, we display the flexibility of this method to interpret diverse and challenging modalities: music and physical simulations of urban environments. △ Less

Submitted 7 July, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:2007.00479 [pdf, ps, other]

The Restricted Isometry of ReLU Networks: Generalization through Norm Concentration

Authors: Alex Goeßmann, Gitta Kutyniok

Abstract: While regression tasks aim at interpolating a relation on the entire input space, they often have to be solved with a limited amount of training data. Still, if the hypothesis functions can be sketched well with the data, one can hope for identifying a generalizing model. In this work, we introduce with the Neural Restricted Isometry Property (NeuRIP) a uniform concentration event, in which all… ▽ More While regression tasks aim at interpolating a relation on the entire input space, they often have to be solved with a limited amount of training data. Still, if the hypothesis functions can be sketched well with the data, one can hope for identifying a generalizing model. In this work, we introduce with the Neural Restricted Isometry Property (NeuRIP) a uniform concentration event, in which all shallow $\mathrm{ReLU}$ networks are sketched with the same quality. To derive the sample complexity for achieving NeuRIP, we bound the covering numbers of the networks in the Sub-Gaussian metric and apply chaining techniques. In case of the NeuRIP event, we then provide bounds on the expected risk, which hold for networks in any sublevel set of the empirical risk. We conclude that all networks with sufficiently small empirical risk generalize uniformly. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: 27 pages, 5 figures

MSC Class: G.3 ACM Class: F.2; G.3

arXiv:2006.05397 [pdf, other]

Real-time Localization Using Radio Maps

Authors: Çağkan Yapar, Ron Levie, Gitta Kutyniok, Giuseppe Caire

Abstract: This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite System typically performs poorly in urban environments when there is no line-of-sight between the devices and the satellites, and thus alternative localization methods are often required. We present a simple yet effective method for localization based on pathloss. In our a… ▽ More This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite System typically performs poorly in urban environments when there is no line-of-sight between the devices and the satellites, and thus alternative localization methods are often required. We present a simple yet effective method for localization based on pathloss. In our approach, the user to be localized reports the received signal strength from a set of base stations with known locations. For each base station we have a good approximation of the pathloss at each location in the map, provided by RadioUNet, an efficient deep learning-based simulator of pathloss functions in urban environment, akin to ray-tracing. Using the approximations of the pathloss functions of all base stations and the reported signal strengths, we are able to extract a very accurate approximation of the location of the user. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2004.12131 [pdf, other]

Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks

Authors: Moritz Geist, Philipp Petersen, Mones Raslan, Reinhold Schneider, Gitta Kutyniok

Abstract: We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly… ▽ More We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly on the dimension of the parameter space and is determined by the intrinsic dimension of the solution manifold of the parametric partial differential equation. We use various methods to establish comparability between test-cases by minimizing the effect of the choice of test-cases on the optimization and sampling aspects of the learning problem. We find strong support for the hypothesis that approximation-theoretical effects heavily influence the practical behavior of learning problems in numerical analysis. △ Less

Submitted 25 April, 2020; originally announced April 2020.

MSC Class: 35J99; 41A25; 41A30; 68T05; 65N30

arXiv:2003.11566 [pdf, other]

Interval Neural Networks: Uncertainty Scores

Authors: Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, Gitta Kutyniok

Abstract: We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical… ▽ More We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: LO and CH contributed equally

ACM Class: I.5.1; I.4.5; J.3; I.2.m

arXiv:2002.12388 [pdf, other]

Tensor network approaches for learning non-linear dynamical laws

Authors: A. Goeßmann, M. Götte, I. Roth, R. Sweke, G. Kutyniok, J. Eisert

Abstract: Given observations of a physical system, identifying the underlying non-linear governing equation is a fundamental task, necessary both for gaining understanding and generating deterministic future predictions. Of most practical relevance are automated approaches to theory building that scale efficiently for complex systems with many degrees of freedom. To date, available scalable methods aim at a… ▽ More Given observations of a physical system, identifying the underlying non-linear governing equation is a fundamental task, necessary both for gaining understanding and generating deterministic future predictions. Of most practical relevance are automated approaches to theory building that scale efficiently for complex systems with many degrees of freedom. To date, available scalable methods aim at a data-driven interpolation, without exploiting or offering insight into fundamental underlying physical principles, such as locality of interactions. In this work, we show that various physical constraints can be captured via tensor network based parameterizations for the governing equation, which naturally ensures scalability. In addition to providing analytic results motivating the use of such models for realistic physical systems, we demonstrate that efficient rank-adaptive optimization algorithms can be used to learn optimal tensor network models without requiring a~priori knowledge of the exact tensor ranks. As such, we provide a physics-informed approach to recovering structured dynamical laws from data, which adaptively balances the need for expressivity and scalability. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: 17 pages, 8 figures

arXiv:1911.09002 [pdf, other]

RadioUNet: Fast Radio Map Estimation with Convolutional Neural Networks

Authors: Ron Levie, Çağkan Yapar, Gitta Kutyniok, Giuseppe Caire

Abstract: In this paper we propose a highly efficient and very accurate deep learning method for estimating the propagation pathloss from a point $x$ (transmitter location) to any point $y$ on a planar domain. For applications such as user-cell site association and device-to-device link scheduling, an accurate knowledge of the pathloss function for all pairs of transmitter-receiver locations is very importa… ▽ More In this paper we propose a highly efficient and very accurate deep learning method for estimating the propagation pathloss from a point $x$ (transmitter location) to any point $y$ on a planar domain. For applications such as user-cell site association and device-to-device link scheduling, an accurate knowledge of the pathloss function for all pairs of transmitter-receiver locations is very important. Commonly used statistical models approximate the pathloss as a decaying function of the distance between transmitter and receiver. However, in realistic propagation environments characterized by the presence of buildings, street canyons, and objects at different heights, such radial-symmetric functions yield very misleading results. In this paper we show that properly designed and trained deep neural networks are able to learn how to estimate the pathloss function, given an urban environment, in a very accurate and computationally efficient manner. Our proposed method, termed RadioUNet, learns from a physical simulation dataset, and generates pathloss estimations that are very close to the simulations, but are much faster to compute for real-time applications. Moreover, we propose methods for transferring what was learned from simulations to real-life. Numerical results show that our method significantly outperforms previously proposed methods. △ Less

Submitted 22 December, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

arXiv:1907.12972 [pdf, other]

Transferability of Spectral Graph Convolutional Neural Networks

Authors: Ron Levie, Wei Huang, Lorenzo Bucci, Michael M. Bronstein, Gitta Kutyniok

Abstract: This paper focuses on spectral graph convolutional neural networks (ConvNets), where filters are defined as elementwise multiplication in the frequency domain of a graph. In machine learning settings where the dataset consists of signals defined on many different graphs, the trained ConvNet should generalize to signals on graphs unseen in the training set. It is thus important to transfer ConvNets… ▽ More This paper focuses on spectral graph convolutional neural networks (ConvNets), where filters are defined as elementwise multiplication in the frequency domain of a graph. In machine learning settings where the dataset consists of signals defined on many different graphs, the trained ConvNet should generalize to signals on graphs unseen in the training set. It is thus important to transfer ConvNets between graphs. Transferability, which is a certain type of generalization capability, can be loosely defined as follows: if two graphs describe the same phenomenon, then a single filter or ConvNet should have similar repercussions on both graphs. This paper aims at debunking the common misconception that spectral filters are not transferable. We show that if two graphs discretize the same "continuous" space, then a spectral filter or ConvNet has approximately the same repercussion on both graphs. Our analysis is more permissive than the standard analysis. Transferability is typically described as the robustness of the filter to small graph perturbations and re-indexing of the vertices. Our analysis accounts also for large graph perturbations. We prove transferability between graphs that can have completely different dimensions and topologies, only requiring that both graphs discretize the same underlying space in some generic sense. △ Less

Submitted 12 June, 2021; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1905.11092 [pdf, other]

A Rate-Distortion Framework for Explaining Neural Network Decisions

Authors: Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok

Abstract: We formalise the widespread idea of interpreting neural network decisions as an explicit optimisation problem in a rate-distortion framework. A set of input features is deemed relevant for a classification decision if the expected classifier score remains nearly constant when randomising the remaining features. We discuss the computational complexity of finding small sets of relevant features and… ▽ More We formalise the widespread idea of interpreting neural network decisions as an explicit optimisation problem in a rate-distortion framework. A set of input features is deemed relevant for a classification decision if the expected classifier score remains nearly constant when randomising the remaining features. We discuss the computational complexity of finding small sets of relevant features and show that the problem is complete for $\mathsf{NP}^\mathsf{PP}$, an important class of computational problems frequently arising in AI tasks. Furthermore, we show that it even remains $\mathsf{NP}$-hard to only approximate the optimal solution to within any non-trivial approximation factor. Finally, we consider a continuous problem relaxation and develop a heuristic solution strategy based on assumed density filtering for deep ReLU neural networks. We present numerical experiments for two image classification data sets where we outperform established methods in particular for sparse explanations of neural network decisions. △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1905.01208 [pdf, other]

Approximation spaces of deep neural networks

Authors: Rémi Gribonval, Gitta Kutyniok, Morten Nielsen, Felix Voigtlaender

Abstract: We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be… ▽ More We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep. △ Less

Submitted 17 July, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

arXiv:1904.00377 [pdf, ps, other]

A Theoretical Analysis of Deep Neural Networks and Parametric PDEs

Authors: Gitta Kutyniok, Philipp Petersen, Mones Raslan, Reinhold Schneider

Abstract: We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. C… ▽ More We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. Concretely, we use the existence of a small reduced basis to construct, for a large variety of parametric partial differential equations, neural networks that yield approximations of the parametric solution maps in such a way that the sizes of these networks essentially only depend on the size of the reduced basis. △ Less

Submitted 14 May, 2020; v1 submitted 31 March, 2019; originally announced April 2019.

MSC Class: 35A35; 35J99; 41A25; 41A46; 68T05; 65N30

arXiv:1901.10524 [pdf, other]

On the Transferability of Spectral Graph Filters

Authors: Ron Levie, Elvin Isufi, Gitta Kutyniok

Abstract: This paper focuses on spectral filters on graphs, namely filters defined as elementwise multiplication in the frequency domain of a graph. In many graph signal processing settings, it is important to transfer a filter from one graph to another. One example is in graph convolutional neural networks (ConvNets), where the dataset consists of signals defined on many different graphs, and the learned f… ▽ More This paper focuses on spectral filters on graphs, namely filters defined as elementwise multiplication in the frequency domain of a graph. In many graph signal processing settings, it is important to transfer a filter from one graph to another. One example is in graph convolutional neural networks (ConvNets), where the dataset consists of signals defined on many different graphs, and the learned filters should generalize to signals on new graphs, not present in the training set. A necessary condition for transferability (the ability to transfer filters) is stability. Namely, given a graph filter, if we add a small perturbation to the graph, then the filter on the perturbed graph is a small perturbation of the original filter. It is a common misconception that spectral filters are not stable, and this paper aims at debunking this mistake. We introduce a space of filters, called the Cayley smoothness space, that contains the filters of state-of-the-art spectral filtering methods, and whose filters can approximate any generic spectral filter. For filters in this space, the perturbation in the filter is bounded by a constant times the perturbation in the graph, and filters in the Cayley smoothness space are thus termed linearly stable. By combining stability with the known property of equivariance, we prove that graph spectral filters are transferable. △ Less

Submitted 29 January, 2019; originally announced January 2019.

arXiv:1901.05744 [pdf, ps, other]

The Oracle of DLphi

Authors: Dominik Alfke, Weston Baines, Jan Blechschmidt, Mauricio J. del Razo Sarmina, Amnon Drory, Dennis Elbrächter, Nando Farchmin, Matteo Gambara, Silke Glas, Philipp Grohs, Peter Hinz, Danijel Kivaranovic, Christian Kümmerle, Gitta Kutyniok, Sebastian Lunz, Jan Macdonald, Ryan Malthaner, Gregory Naisat, Ariel Neufeld, Philipp Christian Petersen, Rafael Reisenhofer, Jun-Da Sheng, Laura Thesing, Philipp Trunschke, Johannes von Lindheim , et al. (2 additional authors not shown)

Abstract: We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting t… ▽ More We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant. △ Less

Submitted 27 January, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

MSC Class: 68T05; 82C32

arXiv:1901.01388 [pdf, other]

Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks

Authors: Héctor Andrade-Loarca, Gitta Kutyniok, Ozan Öktem, Philipp Petersen

Abstract: Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based method… ▽ More Microlocal analysis provides deep insight into singularity structures and is often crucial for solving inverse problems, predominately, in imaging sciences. Of particular importance is the analysis of wavefront sets and the correct extraction of those. In this paper, we introduce the first algorithmic approach to extract the wavefront set of images, which combines data-based and model-based methods. Based on a celebrated property of the shearlet transform to unravel information on the wavefront set, we extract the wavefront set of an image by first applying a discrete shearlet transform and then feeding local patches of this transform to a deep convolutional neural network trained on labeled data. The resulting algorithm outperforms all competing algorithms in edge-orientation and ramp-orientation detection. △ Less

Submitted 10 July, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

MSC Class: 35A18; 65T60; 68T10

arXiv:1808.06329 [pdf, ps, other]

The Mismatch Principle: The Generalized Lasso Under Large Model Uncertainties

Authors: Martin Genzel, Gitta Kutyniok

Abstract: We study the estimation capacity of the generalized Lasso, i.e., least squares minimization combined with a (convex) structural constraint. While Lasso-type estimators were originally designed for noisy linear regression problems, it has recently turned out that they are in fact robust against various types of model uncertainties and misspecifications, most notably, non-linearly distorted observat… ▽ More We study the estimation capacity of the generalized Lasso, i.e., least squares minimization combined with a (convex) structural constraint. While Lasso-type estimators were originally designed for noisy linear regression problems, it has recently turned out that they are in fact robust against various types of model uncertainties and misspecifications, most notably, non-linearly distorted observation models. This work provides more theoretical evidence for this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple recipe to establish theoretical error bounds for the generalized Lasso. The associated estimation guarantees are of independent interest and are formulated in a fairly general setup, permitting arbitrary sub-Gaussian data, possibly with strongly correlated feature designs; in particular, we do not assume a specific observation model which connects the input and output variables. Although the mismatch principle is conceived based on ideas from statistical learning theory, its actual application area are (high-dimensional) estimation tasks for semi-parametric models. In this context, the benefits of the mismatch principle are demonstrated for a variety of popular problem classes, such as single-index models, generalized linear models, and variable selection. Apart from that, our findings are also relevant to recent advances in quantized and distributed compressed sensing. △ Less

Submitted 11 September, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

MSC Class: 68T37; 60D05; 90C25; 62F30; 62F35

arXiv:1608.08852 [pdf, other]

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

Authors: Martin Genzel, Gitta Kutyniok

Abstract: In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs $\{(x_i, y_i)\}_{1 \leq i \leq m}$. The observations $y_i \in \mathbb{R}$ are thereby supposed to follow a noisy single-index model, depending on a certain set of signal variables. A major difficulty is that these variables usually cannot be observed directly, but rather arise as hidde… ▽ More In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs $\{(x_i, y_i)\}_{1 \leq i \leq m}$. The observations $y_i \in \mathbb{R}$ are thereby supposed to follow a noisy single-index model, depending on a certain set of signal variables. A major difficulty is that these variables usually cannot be observed directly, but rather arise as hidden factors in the actual data vectors $x_i \in \mathbb{R}^d$ (feature variables). We will prove that a successful variable selection is still possible in this setup, even when the applied estimator does not have any knowledge of the underlying model parameters and only takes the 'raw' samples $\{(x_i, y_i)\}_{1 \leq i \leq m}$ as input. The model assumptions of our results will be fairly general, allowing for non-linear observations, arbitrary convex signal structures as well as strictly convex loss functions. This is particularly appealing for practical purposes, since in many applications, already standard methods, e.g., the Lasso or logistic regression, yield surprisingly good outcomes. Apart from a general discussion of the practical scope of our theoretical findings, we will also derive a rigorous guarantee for a specific real-world problem, namely sparse feature extraction from (proteomics-based) mass spectrometry data. △ Less

Submitted 31 August, 2016; originally announced August 2016.

arXiv:1506.03620 [pdf, other]

doi 10.1186/s12859-017-1565-4

Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

Authors: Tim Conrad, Martin Genzel, Nada Cvetkovic, Niklas Wulkow, Alexander Leichtle, Jan Vybiral, Gitta Kutyniok, Christof Schütte

Abstract: Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) iden… ▽ More Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets. △ Less

Submitted 26 November, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

Journal ref: BMC Bioinform. 18 (2017), 160

Showing 1–25 of 25 results for author: Kutyniok, G