-
Efficient Line Search Method Based on Regression and Uncertainty Quantification
Authors:
Sören Laue,
Tomislav Prusina
Abstract:
Unconstrained optimization problems are typically solved using iterative methods, which often depend on line search techniques to determine optimal step lengths in each iteration. This paper introduces a novel line search approach. Traditional line search methods, aimed at determining optimal step lengths, often discard valuable data from the search process and focus on refining step length interv…
▽ More
Unconstrained optimization problems are typically solved using iterative methods, which often depend on line search techniques to determine optimal step lengths in each iteration. This paper introduces a novel line search approach. Traditional line search methods, aimed at determining optimal step lengths, often discard valuable data from the search process and focus on refining step length intervals. This paper proposes a more efficient method using Bayesian optimization, which utilizes all available data points, i.e., function values and gradients, to guide the search towards a potential global minimum. This new approach more effectively explores the search space, leading to better solution quality. It is also easy to implement and integrate into existing frameworks. Tested on the challenging CUTEst test set, it demonstrates superior performance compared to existing state-of-the-art methods, solving more problems to optimality with equivalent resource usage.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Detecting Conceptual Abstraction in LLMs
Authors:
Michaela Regneri,
Alhassan Abdelhalim,
Sören Laue
Abstract:
We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction…
▽ More
We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Why Capsule Neural Networks Do Not Scale: Challenging the Dynamic Parse-Tree Assumption
Authors:
Matthias Mitterreiter,
Marcel Koch,
Joachim Giesen,
Sören Laue
Abstract:
Capsule neural networks replace simple, scalar-valued neurons with vector-valued capsules. They are motivated by the pattern recognition system in the human brain, where complex objects are decomposed into a hierarchy of simpler object parts. Such a hierarchy is referred to as a parse-tree. Conceptually, capsule neural networks have been defined to realize such parse-trees. The capsule neural netw…
▽ More
Capsule neural networks replace simple, scalar-valued neurons with vector-valued capsules. They are motivated by the pattern recognition system in the human brain, where complex objects are decomposed into a hierarchy of simpler object parts. Such a hierarchy is referred to as a parse-tree. Conceptually, capsule neural networks have been defined to realize such parse-trees. The capsule neural network (CapsNet), by Sabour, Frosst, and Hinton, is the first actual implementation of the conceptual idea of capsule neural networks. CapsNets achieved state-of-the-art performance on simple image recognition tasks with fewer parameters and greater robustness to affine transformations than comparable approaches. This sparked extensive follow-up research. However, despite major efforts, no work was able to scale the CapsNet architecture to more reasonable-sized datasets. Here, we provide a reason for this failure and argue that it is most likely not possible to scale CapsNets beyond toy examples. In particular, we show that the concept of a parse-tree, the main idea behind capsule neuronal networks, is not present in CapsNets. We also show theoretically and experimentally that CapsNets suffer from a vanishing gradient problem that results in the starvation of many capsules during training.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Convexity Certificates from Hessians
Authors:
Julien Klaus,
Niklas Merk,
Konstantin Wiedom,
Sören Laue,
Joachim Giesen
Abstract:
The Hessian of a differentiable convex function is positive semidefinite. Therefore, checking the Hessian of a given function is a natural approach to certify convexity. However, implementing this approach is not straightforward since it requires a representation of the Hessian that allows its analysis. Here, we implement this approach for a class of functions that is rich enough to support classi…
▽ More
The Hessian of a differentiable convex function is positive semidefinite. Therefore, checking the Hessian of a given function is a natural approach to certify convexity. However, implementing this approach is not straightforward since it requires a representation of the Hessian that allows its analysis. Here, we implement this approach for a class of functions that is rich enough to support classical machine learning. For this class of functions, it was recently shown how to compute computational graphs of their Hessians. We show how to check these graphs for positive semidefiniteness. We compare our implementation of the Hessian approach with the well-established disciplined convex programming (DCP) approach and prove that the Hessian approach is at least as powerful as the DCP approach for differentiable functions. Furthermore, we show for a state-of-the-art implementation of the DCP approach that, for differentiable functions, the Hessian approach is actually more powerful. That is, it can certify the convexity of a larger class of differentiable functions.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Optimization for Classical Machine Learning Problems on the GPU
Authors:
Sören Laue,
Mark Blacher,
Joachim Giesen
Abstract:
Constrained optimization problems arise frequently in classical machine learning. There exist frameworks addressing constrained optimization, for instance, CVXPY and GENO. However, in contrast to deep learning frameworks, GPU support is limited. Here, we extend the GENO framework to also solve constrained optimization problems on the GPU. The framework allows the user to specify constrained optimi…
▽ More
Constrained optimization problems arise frequently in classical machine learning. There exist frameworks addressing constrained optimization, for instance, CVXPY and GENO. However, in contrast to deep learning frameworks, GPU support is limited. Here, we extend the GENO framework to also solve constrained optimization problems on the GPU. The framework allows the user to specify constrained optimization problems in an easy-to-read modeling language. A solver is then automatically generated from this specification. When run on the GPU, the solver outperforms state-of-the-art approaches like CVXPY combined with a GPU-accelerated solver such as cuOSQP or SCS by a few orders of magnitude.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
A Simple and Efficient Tensor Calculus for Machine Learning
Authors:
Sören Laue,
Matthias Mitterreiter,
Joachim Giesen
Abstract:
Computing derivatives of tensor expressions, also known as tensor calculus, is a fundamental task in machine learning. A key concern is the efficiency of evaluating the expressions and their derivatives that hinges on the representation of these expressions. Recently, an algorithm for computing higher order derivatives of tensor expressions like Jacobians or Hessians has been introduced that is a…
▽ More
Computing derivatives of tensor expressions, also known as tensor calculus, is a fundamental task in machine learning. A key concern is the efficiency of evaluating the expressions and their derivatives that hinges on the representation of these expressions. Recently, an algorithm for computing higher order derivatives of tensor expressions like Jacobians or Hessians has been introduced that is a few orders of magnitude faster than previous state-of-the-art approaches. Unfortunately, the approach is based on Ricci notation and hence cannot be incorporated into automatic differentiation frameworks from deep learning like TensorFlow, PyTorch, autograd, or JAX that use the simpler Einstein notation. This leaves two options, to either change the underlying tensor representation in these frameworks or to develop a new, provably correct algorithm based on Einstein notation. Obviously, the first option is impractical. Hence, we pursue the second option. Here, we show that using Ricci notation is not necessary for an efficient tensor calculus and develop an equally efficient method for the simpler Einstein notation. It turns out that turning to Einstein notation enables further improvements that lead to even better efficiency.
The methods that are described in this paper have been implemented in the online tool www.MatrixCalculus.org for computing derivatives of matrix and tensor expressions.
An extended abstract of this paper appeared as "A Simple and Efficient Tensor Calculus", AAAI 2020.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
GENO -- GENeric Optimization for Classical Machine Learning
Authors:
Sören Laue,
Matthias Mitterreiter,
Joachim Giesen
Abstract:
Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for mos…
▽ More
Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
On the Equivalence of Automatic and Symbolic Differentiation
Authors:
Soeren Laue
Abstract:
We show that reverse mode automatic differentiation and symbolic differentiation are equivalent in the sense that they both perform the same operations when computing derivatives. This is in stark contrast to the common claim that they are substantially different. The difference is often illustrated by claiming that symbolic differentiation suffers from "expression swell" whereas automatic differe…
▽ More
We show that reverse mode automatic differentiation and symbolic differentiation are equivalent in the sense that they both perform the same operations when computing derivatives. This is in stark contrast to the common claim that they are substantially different. The difference is often illustrated by claiming that symbolic differentiation suffers from "expression swell" whereas automatic differentiation does not. Here, we show that this statement is not true. "Expression swell" refers to the phenomenon of a much larger representation of the derivative as opposed to the representation of the original function.
△ Less
Submitted 5 December, 2022; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Distributed Convex Optimization with Many Convex Constraints
Authors:
Joachim Giesen,
Sören Laue
Abstract:
We address the problem of solving convex optimization problems with many convex constraints in a distributed setting. Our approach is based on an extension of the alternating direction method of multipliers (ADMM) that recently gained a lot of attention in the Big Data context. Although it has been invented decades ago, ADMM so far can be applied only to unconstrained problems and problems with li…
▽ More
We address the problem of solving convex optimization problems with many convex constraints in a distributed setting. Our approach is based on an extension of the alternating direction method of multipliers (ADMM) that recently gained a lot of attention in the Big Data context. Although it has been invented decades ago, ADMM so far can be applied only to unconstrained problems and problems with linear equality or inequality constraints. Our extension can handle arbitrary inequality constraints directly. It combines the ability of ADMM to solve convex optimization problems in a distributed setting with the ability of the Augmented Lagrangian method to solve constrained optimization problems, and as we show, it inherits the convergence guarantees of ADMM and the Augmented Lagrangian method.
△ Less
Submitted 6 April, 2018; v1 submitted 7 October, 2016;
originally announced October 2016.
-
Generating massive complex networks with hyperbolic geometry faster in practice
Authors:
Moritz von Looz,
Mustafa Özdayi,
Sören Laue,
Henning Meyerhenke
Abstract:
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated…
▽ More
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points whose hyperbolic distance is below a threshold.
We present in this paper a fast generation algorithm for such graphs. Our experiments show that our new generator achieves speedup factors of 3-60 over the best previous implementation. One billion edges can now be generated in under one minute on a shared-memory workstation. Furthermore, we present a dynamic extension to model gradual network change, while preserving at each step the point position probabilities.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
A Hybrid Algorithm for Convex Semidefinite Optimization
Authors:
Soeren Laue
Abstract:
We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general large-scale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems (matrix completion…
▽ More
We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general large-scale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems (matrix completion, metric learning, and sparse PCA) . Our approach outperforms state-of-the-art algorithms.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.
-
Geometric Set Cover and Hitting Sets for Polytopes in $R^3$
Authors:
Sören Laue
Abstract:
Suppose we are given a finite set of points $P$ in $\R^3$ and a collection of polytopes $\mathcal{T}$ that are all translates of the same polytope $T$. We consider two problems in this paper. The first is the set cover problem where we want to select a minimal number of polytopes from the collection $\mathcal{T}$ such that their union covers all input points $P$. The second problem that we consi…
▽ More
Suppose we are given a finite set of points $P$ in $\R^3$ and a collection of polytopes $\mathcal{T}$ that are all translates of the same polytope $T$. We consider two problems in this paper. The first is the set cover problem where we want to select a minimal number of polytopes from the collection $\mathcal{T}$ such that their union covers all input points $P$. The second problem that we consider is finding a hitting set for the set of polytopes $\mathcal{T}$, that is, we want to select a minimal number of points from the input points $P$ such that every given polytope is hit by at least one point. We give the first constant-factor approximation algorithms for both problems. We achieve this by providing an epsilon-net for translates of a polytope in $R^3$ of size $\bigO(\frac{1{ε)$.
△ Less
Submitted 20 February, 2008;
originally announced February 2008.
-
Power Assignment Problems in Wireless Communication
Authors:
Stefan Funke,
Soeren Laue,
Zvi Lotker,
Rouven Naujoks
Abstract:
A fundamental class of problems in wireless communication is concerned with the assignment of suitable transmission powers to wireless devices/stations such that the resulting communication graph satisfies certain desired properties and the overall energy consumed is minimized. Many concrete communication tasks in a wireless network like broadcast, multicast, point-to-point routing, creation of…
▽ More
A fundamental class of problems in wireless communication is concerned with the assignment of suitable transmission powers to wireless devices/stations such that the resulting communication graph satisfies certain desired properties and the overall energy consumed is minimized. Many concrete communication tasks in a wireless network like broadcast, multicast, point-to-point routing, creation of a communication backbone, etc. can be regarded as such a power assignment problem.
This paper considers several problems of that kind; for example one problem studied before in \cite{Carrots, Bilo} aims to select and assign powers to $k$ of the stations such that all other stations are within reach of at least one of the selected stations. We improve the running time for obtaining a $(1+ε)$-approximate solution for this problem from $n^{((α/ε)^{O(d)})}$ as reported by Bilo et al. (\cite{Bilo}) to $O(n+ {(\frac{k^{2d+1}}{ε^d})}^{\min{\{2k, (α/ε)^{O(d)} \}}})$ that is, we obtain a running time that is \emph{linear} in the network size. Further results include a constant approximation algorithm for the TSP problem under squared (non-metric!) edge costs, which can be employed to implement a novel data aggregation protocol, as well as efficient schemes to perform $k$-hop multicasts.
△ Less
Submitted 22 December, 2006;
originally announced December 2006.