-
CoRe Optimizer: An All-in-One Solution for Machine Learning
Authors:
Marco Eckhoff,
Markus Reiher
Abstract:
The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared…
▽ More
The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared to other state-of-the-art first-order gradient-based optimizers for training lifelong machine learning potentials. In this work we provide an extensive performance comparison of the CoRe optimizer and nine other optimization algorithms including the Adam optimizer and resilient backpropagation (RPROP) for diverse machine learning tasks. We analyze the influence of different hyperparameters and provide generally applicable values. The CoRe optimizer yields best or competitive performance in every investigated application, while only one hyperparameter needs to be changed depending on mini-batch or batch learning.
△ Less
Submitted 17 February, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Near critical preferential attachment networks have small giant components
Authors:
Maren Eckhoff,
Peter Morters,
Marcel Ortgiese
Abstract:
Preferential attachment networks with power law exponent $τ>3$ are known to exhibit a phase transition. There is a value $ρ_{\rm c}>0$ such that, for small edge densities $ρ\leq ρ_c$ every component of the graph comprises an asymptotically vanishing proportion of vertices, while for large edge densities $ρ>ρ_c$ there is a unique giant component comprising an asymptotically positive proportion of v…
▽ More
Preferential attachment networks with power law exponent $τ>3$ are known to exhibit a phase transition. There is a value $ρ_{\rm c}>0$ such that, for small edge densities $ρ\leq ρ_c$ every component of the graph comprises an asymptotically vanishing proportion of vertices, while for large edge densities $ρ>ρ_c$ there is a unique giant component comprising an asymptotically positive proportion of vertices. In this paper we study the decay in the size of the giant component as the critical edge density is approached from above. We show that the size decays very rapidly, like $\exp(-c/ \sqrt{ρ-ρ_c})$ for an explicit constant $c>0$ depending on the model implementation. This result is in contrast to the behaviour of the class of rank-one models of scale-free networks, including the configuration model, where the decay is polynomial. Our proofs rely on the local neighbourhood approximations of [Dereich, Morters, 2013] and recent progress in the theory of branching random walks [Gantert, Hu, Shi, 2011].
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
Long paths in first passage percolation on the complete graph I. Local PWIT dynamics
Authors:
M. Eckhoff,
J. Goodman,
R. van der Hofstad,
F. R. Nardi
Abstract:
We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [9]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality c…
▽ More
We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [9]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality classes for first passage percolation on the complete graph. We consider both $n$-independent as well as $n$-dependent edge weights. The simplest example consists of edge weights of the form $E^{s_n}$, where $E$ is an exponential random variable with mean 1.
In this paper, we investigate the case where $s_n\rightarrow \infty$, and focus on the local neighborhood of a vertex. We establish that the smallest-weight tree of a vertex locally converges to the invasion percolation cluster on the Poisson weighted infinite tree. In addition, we identify the scaling limit of the weight of the smallest-weight path between two uniform vertices.
△ Less
Submitted 22 December, 2015; v1 submitted 18 December, 2015;
originally announced December 2015.
-
Long paths in first passage percolation on the complete graph II. Global branching dynamics
Authors:
M. Eckhoff,
J. Goodman,
R. van der Hofstad,
F. R. Nardi
Abstract:
We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [6]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality c…
▽ More
We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [6]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality classes for first passage percolation on the complete graph. We consider both $n$-independent as well as $n$-dependent edge weights. The simplest example consists of edge weights of the form $E^{s_n}$, where $E$ is an exponential random variable with mean 1.
In this paper, we focus on the case where $s_n\rightarrow \infty$ with $s_n=o(n^{1/3})$. Under mild regularity conditions, we identify the scaling limit of the weight of the smallest-weight path between two uniform vertices, and we prove that the number of edges in this path obeys a central limit theorem with asymptotic mean $s_n\log{(n/s_n^3)}$ and variance $s_n^2\log{(n/s_n^3)}$. This settles a conjecture of Bhamidi and van der Hofstad [6]. The proof relies on a decomposition of the smallest-weight tree into an initial part following invasion percolation dynamics, and a main part following branching process dynamics. The initial part has been studied in [14]; the current article focuses on the global branching dynamics.
△ Less
Submitted 22 December, 2015; v1 submitted 18 December, 2015;
originally announced December 2015.
-
Spines, skeletons and the Strong Law of Large Numbers for superdiffusions
Authors:
Maren Eckhoff,
Andreas E. Kyprianou,
Matthias Winkel
Abstract:
Consider a supercritical superdiffusion (X_t) on a domain D subset R^d with branching mechanism
-β(x) z+α(x) z^2 + int_{(0,infty)} (e^{-yz}-1+yz) Pi(x,dy).
The skeleton decomposition provides a pathwise description of the process in terms of immigration along a branching particle diffusion. We use this decomposition to derive the Strong Law of Large Numbers (SLLN) for a wide class of superdiff…
▽ More
Consider a supercritical superdiffusion (X_t) on a domain D subset R^d with branching mechanism
-β(x) z+α(x) z^2 + int_{(0,infty)} (e^{-yz}-1+yz) Pi(x,dy).
The skeleton decomposition provides a pathwise description of the process in terms of immigration along a branching particle diffusion. We use this decomposition to derive the Strong Law of Large Numbers (SLLN) for a wide class of superdiffusions from the corresponding result for branching particle diffusions. That is, we show that for suitable test functions f and starting measures mu,
< f,X_t>/P_{mu}[< f,X_t>] -> W_{infty}, P_{mu}-almost surely as t->infty, where W_{infty} is a finite, non-deterministic random variable characterised as a martingale limit. Our method is based on skeleton and spine techniques and offers structural insights into the driving force behind the SLLN for superdiffusions. The result covers many of the key examples of interest and, in particular, proves a conjecture by Fleischmann and Swart for the super-Wright-Fisher diffusion.
△ Less
Submitted 24 September, 2013;
originally announced September 2013.
-
Vulnerability of robust preferential attachment networks
Authors:
Maren Eckhoff,
Peter Mörters
Abstract:
Scale-free networks with small power law exponent are known to be robust, meaning that their qualitative topological structure cannot be altered by random removal of even a large proportion of nodes. By contrast, it has been argued in the science literature that such networks are highly vulnerable to a targeted attack, and removing a small number of key nodes in the network will dramatically chang…
▽ More
Scale-free networks with small power law exponent are known to be robust, meaning that their qualitative topological structure cannot be altered by random removal of even a large proportion of nodes. By contrast, it has been argued in the science literature that such networks are highly vulnerable to a targeted attack, and removing a small number of key nodes in the network will dramatically change the topological structure. Here we analyse a class of preferential attachment networks in the robust regime and prove four main results supporting this claim: After removal of an arbitrarily small proportion epsilon>0 of the oldest nodes (1) the asymptotic degree distribution has exponential instead of power law tails; (2) the largest degree in the network drops from being of the order of a power of the network size n to being just logarithmic in n; (3) the typical distances in the network increase from order log log n to order log n; and (4) the network becomes vulnerable to random removal of nodes. Importantly, all our results explicitly quantify the dependence on the proportion epsilon of removed vertices. For example, we show that the critical proportion of nodes that have to be retained for survival of the giant component undergoes a steep increase as epsilon moves away from zero, and a comparison of this result with similar ones for other networks reveals the existence of two different universality classes of robust network models. The key technique in our proofs is a local approximation of the network by a branching random walk with two killing boundaries, and an understanding of the particle genealogies in this process, which enters into estimates for the spectral radius of an associated operator.
△ Less
Submitted 22 August, 2013; v1 submitted 14 June, 2013;
originally announced June 2013.
-
Short paths for first passage percolation on the complete graph
Authors:
Maren Eckhoff,
Jesse Goodman,
Remco van der Hofstad,
Francesca R. Nardi
Abstract:
We study the complete graph equipped with a topology induced by independent and identically distributed edge weights. The focus of our analysis is on the weight W_n and the number of edges H_n of the minimal weight path between two distinct vertices in the weak disorder regime. We establish novel and simple first and second moment methods using path counting to derive first order asymptotics for t…
▽ More
We study the complete graph equipped with a topology induced by independent and identically distributed edge weights. The focus of our analysis is on the weight W_n and the number of edges H_n of the minimal weight path between two distinct vertices in the weak disorder regime. We establish novel and simple first and second moment methods using path counting to derive first order asymptotics for the considered quantities. Our results are stated in terms of a sequence of parameters (s_n) that quantifies the extreme-value behaviour of the edge weights, and that describes different universality classes for first passage percolation on the complete graph. These classes contain both n-independent and n-dependent edge weight distributions. The method is most effective for the universality class containing the edge weights E^{s_n}, where E is an exponential(1) random variable and s_n log n -> infty, s_n^2 log n -> 0. We discuss two types of examples from this class in detail. In addition, the class where s_n log n stays finite is studied. This article is a contribution to the program initiated in \cite{BhaHof12}.
△ Less
Submitted 19 November, 2012;
originally announced November 2012.
-
Brownian motion on R trees
Authors:
Siva Athreya,
Michael Eckhoff,
Anita Winter
Abstract:
The real trees form a class of metric spaces that extends the class of trees with edge lengths by allowing behavior such as infinite total edge length and vertices with infinite branching degree. We use Dirichlet form methods to construct Brownian motion on any given locally compact $R$-tree {$(T,r)$} equipped with a Radon measure $ν$ {on $(T,{\mathcal B}(T))$}. We specify a criterion under which…
▽ More
The real trees form a class of metric spaces that extends the class of trees with edge lengths by allowing behavior such as infinite total edge length and vertices with infinite branching degree. We use Dirichlet form methods to construct Brownian motion on any given locally compact $R$-tree {$(T,r)$} equipped with a Radon measure $ν$ {on $(T,{\mathcal B}(T))$}. We specify a criterion under which the Brownian motion is recurrent or transient. For compact recurrent $R$-trees we provide bounds on the mixing time.
In this revised version, assumption (A3) for an $R$-tree has been removed.
△ Less
Submitted 11 October, 2011; v1 submitted 15 February, 2011;
originally announced February 2011.
-
Precise asymptotics of small eigenvalues of reversible diffusions in the metastable regime
Authors:
Michael Eckhoff
Abstract:
We investigate the close connection between metastability of the reversible diffusion process X defined by the stochastic differential equation dX_t=-\nabla F(X_t) dt+\sqrt2εdW_t,\qquad ε>0, and the spectrum near zero of its generator -L_ε\equiv εΔ-\nabla F\cdot\nabla, where F:R^d\to R and W denotes Brownian motion on R^d. For generic F to each local minimum of F there corresponds a metastable s…
▽ More
We investigate the close connection between metastability of the reversible diffusion process X defined by the stochastic differential equation dX_t=-\nabla F(X_t) dt+\sqrt2εdW_t,\qquad ε>0, and the spectrum near zero of its generator -L_ε\equiv εΔ-\nabla F\cdot\nabla, where F:R^d\to R and W denotes Brownian motion on R^d. For generic F to each local minimum of F there corresponds a metastable state. We prove that the distribution of its rescaled relaxation time converges to the exponential distribution as ε\downarrow 0 with optimal and uniform error estimates. Each metastable state can be viewed as an eigenstate of L_ε with eigenvalue which converges to zero exponentially fast in 1/ε. Modulo errors of exponentially small order in 1/εthis eigenvalue is given as the inverse of the expected metastable relaxation time. The eigenstate is highly concentrated in the basin of attraction of the corresponding trap.
△ Less
Submitted 25 March, 2005;
originally announced March 2005.
-
Metastability and low lying spectra in reversible Markov chains
Authors:
A. Bovier,
M. Eckhoff,
V. Gayrard,
M. Klein
Abstract:
We study a large class of reversible Markov chains with discrete state space and transition matrix $P_N$. We define the notion of a set of {\it metastable points} as a subset of the state space $\G_N$ such that (i) this set is reached from any point $x\in \G_N$ without return to x with probability at least $b_N$, while (ii) for any two point x,y in the metastable set, the probability…
▽ More
We study a large class of reversible Markov chains with discrete state space and transition matrix $P_N$. We define the notion of a set of {\it metastable points} as a subset of the state space $\G_N$ such that (i) this set is reached from any point $x\in \G_N$ without return to x with probability at least $b_N$, while (ii) for any two point x,y in the metastable set, the probability $T^{-1}_{x,y}$ to reach y from x without return to x is smaller than $a_N^{-1}\ll b_N$. Under some additional non-degeneracy assumption, we show that in such a situation: \item{(i)} To each metastable point corresponds a metastable state, whose mean exit time can be computed precisely. \item{(ii)} To each metastable point corresponds one simple eigenvalue of $1-P_N$ which is essentially equal to the inverse mean exit time from this state. The corresponding eigenfunctions are close to the indicator function of the support of the metastable state. Moreover, these results imply very sharp uniform control of the deviation of the probability distribution of metastable exit times from the exponential distribution.
△ Less
Submitted 26 July, 2000;
originally announced July 2000.