Search | arXiv e-print repository

Revealing the temporal dynamics of antibiotic anomalies in the infant gut microbiome with neural jump ODEs

Authors: Anja Adamov, Markus Chardonnet, Florian Krach, Jakob Heiss, Josef Teichmann, Nicholas A. Bokulich

Abstract: Detecting anomalies in irregularly sampled multi-variate time-series is challenging, especially in data-scarce settings. Here we introduce an anomaly detection framework for irregularly sampled time-series that leverages neural jump ordinary differential equations (NJODEs). The method infers conditional mean and variance trajectories in a fully path dependent way and computes anomaly scores. On sy… ▽ More Detecting anomalies in irregularly sampled multi-variate time-series is challenging, especially in data-scarce settings. Here we introduce an anomaly detection framework for irregularly sampled time-series that leverages neural jump ordinary differential equations (NJODEs). The method infers conditional mean and variance trajectories in a fully path dependent way and computes anomaly scores. On synthetic data containing jump, drift, diffusion, and noise anomalies, the framework accurately identifies diverse deviations. Applied to infant gut microbiome trajectories, it delineates the magnitude and persistence of antibiotic-induced disruptions: revealing prolonged anomalies after second antibiotic courses, extended duration treatments, and exposures during the second year of life. We further demonstrate the predictive capabilities of the inferred anomaly scores in accurately predicting antibiotic events and outperforming diversity-based baselines. Our approach accommodates unevenly spaced longitudinal observations, adjusts for static and dynamic covariates, and provides a foundation for inferring microbial anomalies induced by perturbations, offering a translational opportunity to optimize intervention regimens by minimizing microbial disruptions. △ Less

Submitted 30 September, 2025; originally announced October 2025.

arXiv:2503.16696 [pdf, ps, other]

Universal approximation property of neural stochastic differential equations

Authors: Anna P. Kwossek, David J. Prömel, Josef Teichmann

Abstract: We identify various classes of neural networks that are able to approximate continuous functions locally uniformly subject to fixed global linear growth constraints. For such neural networks the associated neural stochastic differential equations can approximate general stochastic differential equations, both of Itô diffusion type, arbitrarily well. Moreover, quantitative error estimates are deriv… ▽ More We identify various classes of neural networks that are able to approximate continuous functions locally uniformly subject to fixed global linear growth constraints. For such neural networks the associated neural stochastic differential equations can approximate general stochastic differential equations, both of Itô diffusion type, arbitrarily well. Moreover, quantitative error estimates are derived for stochastic differential equations with sufficiently regular coefficients. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 20 pages

MSC Class: 41A29; 60H10; 68T07; 91G80

arXiv:2502.03163 [pdf, other]

Signature Reconstruction from Randomized Signatures

Authors: Mie Glückstad, Nicola Muca Cirone, Josef Teichmann

Abstract: Controlled ordinary differential equations driven by continuous bounded variation curves can be considered a continuous time analogue of recurrent neural networks for the construction of expressive features of the input curves. We ask up to which extent well known signature features of such curves can be reconstructed from controlled ordinary differential equations with (untrained) random vector f… ▽ More Controlled ordinary differential equations driven by continuous bounded variation curves can be considered a continuous time analogue of recurrent neural networks for the construction of expressive features of the input curves. We ask up to which extent well known signature features of such curves can be reconstructed from controlled ordinary differential equations with (untrained) random vector fields. The answer turns out to be algebraically involved, but essentially the number of signature features, which can be reconstructed from the non-linear flow of the controlled ordinary differential equation, is exponential in its hidden dimension, when the vector fields are chosen to be neural with depth two. Moreover, we characterize a general linear independence condition on arbitrary vector fields, under which the signature features up to some fixed order can always be reconstructed. Algebraically speaking this complements in a quantitative manner several well known results from the theory of Lie algebras of vector fields and puts them in a context of machine learning. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: 37 pages, 7 figures

MSC Class: 60L10 (Primary) 60L70; 60L90; 68T07 (Secondary)

arXiv:2407.18808 [pdf, other]

Learning Chaotic Systems and Long-Term Predictions with Neural Jump ODEs

Authors: Florian Krach, Josef Teichmann

Abstract: The Path-dependent Neural Jump ODE (PD-NJ-ODE) is a model for online prediction of generic (possibly non-Markovian) stochastic processes with irregular (in time) and potentially incomplete (with respect to coordinates) observations. It is a model for which convergence to the $L^2$-optimal predictor, which is given by the conditional expectation, is established theoretically. Thereby, the training… ▽ More The Path-dependent Neural Jump ODE (PD-NJ-ODE) is a model for online prediction of generic (possibly non-Markovian) stochastic processes with irregular (in time) and potentially incomplete (with respect to coordinates) observations. It is a model for which convergence to the $L^2$-optimal predictor, which is given by the conditional expectation, is established theoretically. Thereby, the training of the model is solely based on a dataset of realizations of the underlying stochastic process, without the need of knowledge of the law of the process. In the case where the underlying process is deterministic, the conditional expectation coincides with the process itself. Therefore, this framework can equivalently be used to learn the dynamics of ODE or PDE systems solely from realizations of the dynamical system with different initial conditions. We showcase the potential of our method by applying it to the chaotic system of a double pendulum. When training the standard PD-NJ-ODE method, we see that the prediction starts to diverge from the true path after about half of the evaluation time. In this work we enhance the model with two novel ideas, which independently of each other improve the performance of our modelling setup. The resulting dynamics match the true dynamics of the chaotic system very closely. The same enhancements can be used to provably enable the PD-NJ-ODE to learn long-term predictions for general stochastic datasets, where the standard model fails. This is verified in several experiments. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2307.13147 [pdf, other]

Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework

Authors: William Andersson, Jakob Heiss, Florian Krach, Josef Teichmann

Abstract: The Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independen… ▽ More The Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them. In particular, we can lift the assumption of independence by extending the theory to much more realistic settings of conditional independence without any need to change the algorithm. Moreover, we introduce a new loss function, which allows us to deal with noisy observations and explain why the previously used loss function did not lead to a consistent estimator. △ Less

Submitted 5 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Journal ref: Transactions on Machine Learning Research (TMLR) 2024

arXiv:2306.03303 [pdf, other]

Global universal approximation of functional input maps on weighted spaces

Authors: Christa Cuchiero, Philipp Schmocker, Josef Teichmann

Abstract: We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family to map the input weighted space to the hidden layer, on which a non-linear scalar activation function is applied to each neuron, and finally return the output via some linear readou… ▽ More We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family to map the input weighted space to the hidden layer, on which a non-linear scalar activation function is applied to each neuron, and finally return the output via some linear readouts. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result on weighted spaces for continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and emphasize that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves a way towards uncertainty quantification for signature kernel regression. △ Less

Submitted 2 February, 2025; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 67 pages, 4 figures

MSC Class: 26A16; 26E20; 41A65; 41A81; 46E40; 60L10; 68T07

arXiv:2303.11454 [pdf, other]

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

Authors: Jakob Heiss, Josef Teichmann, Hanna Wutte

Abstract: Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in fu… ▽ More Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 16 pages + appendix

arXiv:2206.14284 [pdf, other]

Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs

Authors: Florian Krach, Marc Nübel, Josef Teichmann

Abstract: This paper studies the problem of forecasting general stochastic processes using a path-dependent extension of the Neural Jump ODE (NJ-ODE) framework \citep{herrera2021neural}. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time series, these results were limited to data stemming from Itô-diffusions with complete observations, in… ▽ More This paper studies the problem of forecasting general stochastic processes using a path-dependent extension of the Neural Jump ODE (NJ-ODE) framework \citep{herrera2021neural}. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time series, these results were limited to data stemming from Itô-diffusions with complete observations, in particular Markov processes, where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to classical stochastic filtering problems and to limit order book (LOB) data. △ Less

Submitted 4 July, 2024; v1 submitted 28 June, 2022; originally announced June 2022.

arXiv:2201.02441 [pdf, other]

Applications of Signature Methods to Market Anomaly Detection

Authors: Erdinc Akyildirim, Matteo Gambara, Josef Teichmann, Syang Zhou

Abstract: Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; ad… ▽ More Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; additionally we provide an easy, representation theoretic justification for the construction of randomized signatures. Our first application is based on synthetic data and aims at distinguishing between real and fake trajectories of stock prices, which are indistinguishable by visual inspection. We also show a real life application by using transaction data from the cryptocurrency market. In this case, we are able to identify pump and dump attempts organized on social networks with F1 scores up to 88% by means of our unsupervised learning algorithm, thus achieving results that are close to the state-of-the-art in the field based on supervised learning. △ Less

Submitted 8 February, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

arXiv:2112.15577 [pdf, other]

doi 10.3929/ethz-b-000550890

How Infinitely Wide Neural Networks Can Benefit from Multi-task Learning -- an Exact Macroscopic Characterization

Authors: Jakob Heiss, Josef Teichmann, Hanna Wutte

Abstract: In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs). While infinite-width limits of NNs can provide good intuition for their generalization behavior, the well-known infinite-width limits of NNs in the literature (e.g., neural tangent kernels) assume specific settings in which wide ReLU-NNs behave like shallow Gaussi… ▽ More In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs). While infinite-width limits of NNs can provide good intuition for their generalization behavior, the well-known infinite-width limits of NNs in the literature (e.g., neural tangent kernels) assume specific settings in which wide ReLU-NNs behave like shallow Gaussian Processes with a fixed kernel. Consequently, in such settings, these NNs lose their ability to benefit from multi-task learning in the infinite-width limit. In contrast, we prove that optimizing wide ReLU neural networks with at least one hidden layer using L2-regularization on the parameters promotes multi-task learning due to representation-learning - also in the limiting regime where the network width tends to infinity. We present an exact quantitative characterization of this infinite width limit in an appropriate function space that neatly describes multi-task learning. △ Less

Submitted 20 October, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Comments: 13 pages + appendix

MSC Class: 68T07; 68Q32 ACM Class: I.2

arXiv:2104.13669 [pdf, other]

Optimal Stopping via Randomized Neural Networks

Authors: Calypso Herrera, Florian Krach, Pierre Ruyssen, Josef Teichmann

Abstract: This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks to approximate the solutions of optimal stopping problems. The key idea is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are appl… ▽ More This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks to approximate the solutions of optimal stopping problems. The key idea is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees can be provided. We test our approaches for American option pricing on Black--Scholes, Heston and rough Heston models and for optimally stopping a fractional Brownian motion. In all cases, our algorithms outperform the state-of-the-art and other relevant machine learning approaches in terms of computation time while achieving comparable results. Moreover, we show that they can also be used to efficiently compute Greeks of American options. △ Less

Submitted 1 December, 2023; v1 submitted 28 April, 2021; originally announced April 2021.

MSC Class: 60G40 (Primary); 68T07 (Secondary)

arXiv:2102.13640 [pdf, other]

NOMU: Neural Optimization-based Model Uncertainty

Authors: Jakob Heiss, Jakob Weissteiner, Hanna Wutte, Sven Seuken, Josef Teichmann

Abstract: We study methods for estimating model uncertainty for neural networks (NNs) in regression. To isolate the effect of model uncertainty, we focus on a noiseless setting with scarce training data. We introduce five important desiderata regarding model uncertainty that any method should satisfy. However, we find that established benchmarks often fail to reliably capture some of these desiderata, even… ▽ More We study methods for estimating model uncertainty for neural networks (NNs) in regression. To isolate the effect of model uncertainty, we focus on a noiseless setting with scarce training data. We introduce five important desiderata regarding model uncertainty that any method should satisfy. However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). The main idea of NOMU is to design a network architecture consisting of two connected sub-NNs, one for model prediction and one for model uncertainty, and to train it using a carefully-designed loss function. Importantly, our design enforces that NOMU satisfies our five desiderata. Due to its modular architecture, NOMU can provide model uncertainty for any given (previously trained) NN if given access to its training data. We evaluate NOMU in various regressions tasks and noiseless Bayesian optimization (BO) with costly evaluations. In regression, NOMU performs at least as well as state-of-the-art methods. In BO, NOMU even outperforms all considered benchmarks. △ Less

Submitted 11 March, 2023; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: 9 pages + appendix

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8708-8758, 2022

arXiv:2010.14615 [pdf, ps, other]

Discrete-time signatures and randomness in reservoir computing

Authors: Christa Cuchiero, Lukas Gonon, Lyudmila Grigoryeva, Juan-Pablo Ortega, Josef Teichmann

Abstract: A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projecti… ▽ More A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided. △ Less

Submitted 17 September, 2020; originally announced October 2020.

Comments: 14 pages

arXiv:2006.09455 [pdf, other]

Consistent Recalibration Models and Deep Calibration

Authors: Matteo Gambara, Josef Teichmann

Abstract: Consistent Recalibration models (CRC) have been introduced to capture in necessary generality the dynamic features of term structures of derivatives' prices. Several approaches have been suggested to tackle this problem, but all of them, including CRC models, suffered from numerical intractabilities mainly due to the presence of complicated drift terms or consistency conditions. We overcome this p… ▽ More Consistent Recalibration models (CRC) have been introduced to capture in necessary generality the dynamic features of term structures of derivatives' prices. Several approaches have been suggested to tackle this problem, but all of them, including CRC models, suffered from numerical intractabilities mainly due to the presence of complicated drift terms or consistency conditions. We overcome this problem by machine learning techniques, which allow to store the crucial drift term's information in neural network type functions. This yields first time dynamic term structure models which can be efficiently simulated. △ Less

Submitted 1 July, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2006.04727 [pdf, other]

Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering

Authors: Calypso Herrera, Florian Krach, Josef Teichmann

Abstract: Combinations of neural ODEs with recurrent neural networks (RNN), like GRU-ODE-Bayes or ODE-RNN are well suited to model irregularly observed time series. While those models outperform existing discrete-time approaches, no theoretical guarantees for their predictive capabilities are available. Assuming that the irregularly-sampled time series data originates from a continuous stochastic process, t… ▽ More Combinations of neural ODEs with recurrent neural networks (RNN), like GRU-ODE-Bayes or ODE-RNN are well suited to model irregularly observed time series. While those models outperform existing discrete-time approaches, no theoretical guarantees for their predictive capabilities are available. Assuming that the irregularly-sampled time series data originates from a continuous stochastic process, the $L^2$-optimal online prediction is the conditional expectation given the currently available information. We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time, the conditional expectation of a stochastic process. Our approach models the conditional expectation between two observations with a neural ODE and jumps whenever a new observation is made. We define a novel training framework, which allows us to prove theoretical guarantees for the first time. In particular, we show that the output of our model converges to the $L^2$-optimal prediction. This can be interpreted as solution to a special filtering problem. We provide experiments showing that the theoretical results also hold empirically. Moreover, we experimentally show that our model outperforms the baselines in more complex learning tasks and give comparisons on real-world datasets. △ Less

Submitted 16 April, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Journal ref: International Conference on Learning Representations (2021)

arXiv:2005.02505 [pdf, other]

doi 10.3390/risks8040101

A generative adversarial network approach to calibration of local stochastic volatility models

Authors: Christa Cuchiero, Wahid Khosrawi, Josef Teichmann

Abstract: We propose a fully data-driven approach to calibrate local stochastic volatility (LSV) models, circumventing in particular the ad hoc interpolation of the volatility surface. To achieve this, we parametrize the leverage function by a family of feed-forward neural networks and learn their parameters directly from the available market option prices. This should be seen in the context of neural SDEs… ▽ More We propose a fully data-driven approach to calibrate local stochastic volatility (LSV) models, circumventing in particular the ad hoc interpolation of the volatility surface. To achieve this, we parametrize the leverage function by a family of feed-forward neural networks and learn their parameters directly from the available market option prices. This should be seen in the context of neural SDEs and (causal) generative adversarial networks: we generate volatility surfaces by specific neural SDEs, whose quality is assessed by quantifying, possibly in an adversarial manner, distances to market prices. The minimization of the calibration functional relies strongly on a variance reduction technique based on hedging and deep hedging, which is interesting in its own right: it allows the calculation of model prices and model implied volatilities in an accurate way using only small sets of sample paths. For numerical illustration we implement a SABR-type LSV model and conduct a thorough statistical performance analysis on many samples of implied volatility smiles, showing the accuracy and stability of the method. △ Less

Submitted 29 September, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: Replacement for previous version: Major update of previous version to match the content of the published version

Journal ref: Risks 2020, 8, 101

arXiv:2004.13612 [pdf, other]

Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices

Authors: Calypso Herrera, Florian Krach, Anastasis Kratsios, Pierre Ruyssen, Josef Teichmann

Abstract: The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that nearly instantaneousl… ▽ More The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that nearly instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of covariance matrices, or more generally, of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees for Denise are provided. These include a novel universal approximation theorem adapted to our geometric deep learning problem and convergence to an optimal solution to the learning problem. Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately $2000\times$ faster than the state-of-the-art, principal component pursuit (PCP), and $200 \times$ faster than the current speed-optimized method, fast PCP. △ Less

Submitted 6 June, 2023; v1 submitted 28 April, 2020; originally announced April 2020.

Journal ref: Transactions on Machine Learning Research (2023)

arXiv:2004.13135 [pdf, other]

Local Lipschitz Bounds of Deep Neural Networks

Authors: Calypso Herrera, Florian Krach, Josef Teichmann

Abstract: The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient-based optimization methods. It is generally unclear how to estimate the Lipschitz constant of a complex model. Thus, this paper studies an important problem that may be useful to the broader area of non-convex optimization. The main result provides a local upper bound on the Lipschitz constants of… ▽ More The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient-based optimization methods. It is generally unclear how to estimate the Lipschitz constant of a complex model. Thus, this paper studies an important problem that may be useful to the broader area of non-convex optimization. The main result provides a local upper bound on the Lipschitz constants of a multi-layer feed-forward neural network and its gradient. Moreover, lower bounds are established as well, which are used to show that it is impossible to derive global upper bounds for the Lipschitz constants. In contrast to previous works, we compute the Lipschitz constants with respect to the network parameters and not with respect to the inputs. These constants are needed for the theoretical description of many step size schedulers of gradient based optimization schemes and their convergence analysis. The idea is both simple and effective. The results are extended to a generalization of neural networks, continuously deep neural networks, which are described by controlled ODEs. △ Less

Submitted 9 February, 2023; v1 submitted 27 April, 2020; originally announced April 2020.

arXiv:1911.02903 [pdf, other]

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Authors: Jakob Heiss, Josef Teichmann, Hanna Wutte

Abstract: In this paper, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we mathematically show that for such networks L2-regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained… ▽ More In this paper, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we mathematically show that for such networks L2-regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a novel correspondence between the early stopped gradient descent (without any explicit regularization of the weights) and the smoothing spline regression. △ Less

Submitted 4 October, 2023; v1 submitted 7 November, 2019; originally announced November 2019.

Comments: adding Appendix C for more intuition, fixing typos, improving formulations, (moving end of Section 3.1 into Appendix B)

MSC Class: 41Axx; 93Exx; 68T05; 68Q32 ACM Class: I.2.6; G.3

arXiv:1209.2566 [pdf, other]

Generalizations of Matérn's hard-core point processes

Authors: Jakob Teichmann, Felix Ballani, Karl Gerald van den Boogaart

Abstract: Matérn's hard-core processes are valuable point process models in spatial statistics. In order to extend their field of application, Matérn's original models are generalized here, both as point processes and particle processes. The thinning rule uses a distance-dependent probability function, which controls deletion of points close together. For this general setting, explicit formulas for first- a… ▽ More Matérn's hard-core processes are valuable point process models in spatial statistics. In order to extend their field of application, Matérn's original models are generalized here, both as point processes and particle processes. The thinning rule uses a distance-dependent probability function, which controls deletion of points close together. For this general setting, explicit formulas for first- and second-order characteristics can be given. Two examples from materials science illustrate the application of the models. △ Less

Submitted 12 September, 2012; originally announced September 2012.

Comments: 21 pages, 17 figures

MSC Class: 60D05; 60G55

Showing 1–20 of 20 results for author: Teichmann, J