-
Non-local Optimization: Imposing Structure on Optimization Problems by Relaxation
Authors:
Nils Müller,
Tobias Glasmachers
Abstract:
In stochastic optimization, particularly in evolutionary computation and reinforcement learning, the optimization of a function $f: Ω\to \mathbb{R}$ is often addressed through optimizing a so-called relaxation $θ\in Θ\mapsto \mathbb{E}_θ(f)$ of $f$, where $Θ$ resembles the parameters of a family of probability measures on $Ω$. We investigate the structure of such relaxations by means of measure th…
▽ More
In stochastic optimization, particularly in evolutionary computation and reinforcement learning, the optimization of a function $f: Ω\to \mathbb{R}$ is often addressed through optimizing a so-called relaxation $θ\in Θ\mapsto \mathbb{E}_θ(f)$ of $f$, where $Θ$ resembles the parameters of a family of probability measures on $Ω$. We investigate the structure of such relaxations by means of measure theory and Fourier analysis, enabling us to shed light on the success of many associated stochastic optimization methods. The main structural traits we derive and that allow fast and reliable optimization of relaxations are the consistency of optimal values of $f$, Lipschitzness of gradients, and convexity. We emphasize settings where $f$ itself is not differentiable or convex, e.g., in the presence of (stochastic) disturbance.
△ Less
Submitted 24 July, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Latent Representation Prediction Networks
Authors:
Hlynur Davíð Hlynsson,
Merlin Schüler,
Robin Schiewer,
Tobias Glasmachers,
Laurenz Wiskott
Abstract:
Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such t…
▽ More
Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such that they are directly optimized for the task at hand: to be maximally predictable for the predictor function. This results in representations that are by design optimal for the downstream task of planning, where the learned predictor function is used as a forward model.
To this end, we propose a new way of jointly learning this representation along with the prediction function, a system we dub Latent Representation Prediction Network (LARP). The prediction function is used as a forward model for search on a graph in a viewpoint-matching task and the representation learned to maximize predictability is found to outperform a pre-trained representation. Our approach is shown to be more sample-efficient than standard reinforcement learning methods and our learned representation transfers successfully to dissimilar objects.
△ Less
Submitted 17 March, 2021; v1 submitted 20 September, 2020;
originally announced September 2020.
-
Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing
Authors:
Declan Oller,
Tobias Glasmachers,
Giuseppe Cuccu
Abstract:
We propose a novel method for analyzing and visualizing the complexity of standard reinforcement learning (RL) benchmarks based on score distributions. A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task; the study of their aggregated results provide insights into the benchmark complexity. Our method guarantees objectivity…
▽ More
We propose a novel method for analyzing and visualizing the complexity of standard reinforcement learning (RL) benchmarks based on score distributions. A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task; the study of their aggregated results provide insights into the benchmark complexity. Our method guarantees objectivity of evaluation by sidestepping learning altogether: the policy network parameters are generated using Random Weight Guessing (RWG), making our method agnostic to (i) the classic RL setup, (ii) any learning algorithm, and (iii) hyperparameter tuning. We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty. We test our approach on a variety of classic control benchmarks from the OpenAI Gym, where we show that small untrained networks can provide a robust baseline for a variety of tasks. The networks generated often show good performance even without gradual learning, incidentally highlighting the triviality of a few popular benchmarks.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Vehicle Shape and Color Classification Using Convolutional Neural Network
Authors:
Mohamed Nafzi,
Michael Brauckmann,
Tobias Glasmachers
Abstract:
This paper presents a module of vehicle reidentification based on make/model and color classification. It could be used by the Automated Vehicular Surveillance (AVS) or by the fast analysis of video data. Many of problems, that are related to this topic, had to be addressed. In order to facilitate and accelerate the progress in this subject, we will present our way to collect and to label a large…
▽ More
This paper presents a module of vehicle reidentification based on make/model and color classification. It could be used by the Automated Vehicular Surveillance (AVS) or by the fast analysis of video data. Many of problems, that are related to this topic, had to be addressed. In order to facilitate and accelerate the progress in this subject, we will present our way to collect and to label a large scale data set. We used deeper neural networks in our training. They showed a good classification accuracy. We show the results of make/model and color classification on controlled and video data set. We demonstrate with the help of a developed application the re-identification of vehicles on video images based on make/model and color classification. This work was partially funded under the grant.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Dual SVM Training on a Budget
Authors:
Sahar Qaadan,
Merlin Schüler,
Tobias Glasmachers
Abstract:
We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization…
▽ More
We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Speeding Up Budgeted Stochastic Gradient Descent SVM Training with Precomputed Golden Section Search
Authors:
Tobias Glasmachers,
Sahar Qaadan
Abstract:
Limiting the model size of a kernel support vector machine to a pre-defined budget is a well-established technique that allows to scale SVM learning and prediction to large-scale data. Its core addition to simple stochastic gradient training is budget maintenance through merging of support vectors. This requires solving an inner optimization problem with an iterative method many times per gradient…
▽ More
Limiting the model size of a kernel support vector machine to a pre-defined budget is a well-established technique that allows to scale SVM learning and prediction to large-scale data. Its core addition to simple stochastic gradient training is budget maintenance through merging of support vectors. This requires solving an inner optimization problem with an iterative method many times per gradient step. In this paper we replace the iterative procedure with a fast lookup. We manage to reduce the merging time by up to 65% and the total training time by 44% without any loss of accuracy.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Multi-Merge Budget Maintenance for Stochastic Gradient Descent SVM Training
Authors:
Sahar Qaadan,
Tobias Glasmachers
Abstract:
Budgeted Stochastic Gradient Descent (BSGD) is a state-of-the-art technique for training large-scale kernelized support vector machines. The budget constraint is maintained incrementally by merging two points whenever the pre-defined budget is exceeded. The process of finding suitable merge partners is costly; it can account for up to 45% of the total training time. In this paper we investigate co…
▽ More
Budgeted Stochastic Gradient Descent (BSGD) is a state-of-the-art technique for training large-scale kernelized support vector machines. The budget constraint is maintained incrementally by merging two points whenever the pre-defined budget is exceeded. The process of finding suitable merge partners is costly; it can account for up to 45% of the total training time. In this paper we investigate computationally more efficient schemes that merge more than two points at once. We obtain significant speed-ups without sacrificing accuracy.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Limits of End-to-End Learning
Authors:
Tobias Glasmachers
Abstract:
End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning proce…
▽ More
End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex.
In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
Fast model selection by limiting SVM training times
Authors:
Aydin Demircioglu,
Daniel Horn,
Tobias Glasmachers,
Bernd Bischl,
Claus Weihs
Abstract:
Kernelized Support Vector Machines (SVMs) are among the best performing supervised learning methods. But for optimal predictive performance, time-consuming parameter tuning is crucial, which impedes application. To tackle this problem, the classic model selection procedure based on grid-search and cross-validation was refined, e.g. by data subsampling and direct search heuristics. Here we focus on…
▽ More
Kernelized Support Vector Machines (SVMs) are among the best performing supervised learning methods. But for optimal predictive performance, time-consuming parameter tuning is crucial, which impedes application. To tackle this problem, the classic model selection procedure based on grid-search and cross-validation was refined, e.g. by data subsampling and direct search heuristics. Here we focus on a different aspect, the stopping criterion for SVM training. We show that by limiting the training time given to the SVM solver during parameter tuning we can reduce model selection times by an order of magnitude.
△ Less
Submitted 10 February, 2016;
originally announced February 2016.
-
Coordinate Descent with Online Adaptation of Coordinate Frequencies
Authors:
Tobias Glasmachers,
Ürün Dogan
Abstract:
Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression.
We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies…
▽ More
Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression.
We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies beforehand we propose an online adaptation mechanism for this important parameter, called the adaptive coordinate frequencies (ACF) method. This mechanism removes the need to estimate optimal coordinate frequencies beforehand, and it automatically reacts to changing requirements during an optimization run.
We demonstrate the usefulness of our ACF-CD approach for a variety of optimization problems arising in machine learning contexts. Our algorithm offers significant speed-ups over state-of-the-art training methods.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
Testing Hypotheses by Regularized Maximum Mean Discrepancy
Authors:
Somayeh Danafar,
Paola M. V. Rancoita,
Tobias Glasmachers,
Kevin Whittingstall,
Juergen Schmidhuber
Abstract:
Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis tes…
▽ More
Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.
△ Less
Submitted 2 May, 2013;
originally announced May 2013.
-
Accelerated Linear SVM Training with Adaptive Variable Selection Frequencies
Authors:
Tobias Glasmachers,
Ürün Dogan
Abstract:
Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposit…
▽ More
Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposition algorithms for training of non-linear SVMs, but with greatly reduced computational complexity per update step. This comes at the cost of not keeping track of the gradient of the objective any more, which excludes the application of highly developed working set selection algorithms. We present an algorithmic improvement to this method. We replace uniform working set selection with an online adaptation of selection frequencies. The adaptation criterion is inspired by modern second order working set selection methods. The same mechanism replaces the shrinking heuristic. This novel technique speeds up training in some cases by more than an order of magnitude.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
Natural Evolution Strategies
Authors:
Daan Wierstra,
Tom Schaul,
Tobias Glasmachers,
Yi Sun,
Jürgen Schmidhuber
Abstract:
This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms. NES maintains a parameterized distribution on the set of solution candidates, and the natural gradient is used to update the distribution's parameters in the direction of higher expected fitness. We intr…
▽ More
This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms. NES maintains a parameterized distribution on the set of solution candidates, and the natural gradient is used to update the distribution's parameters in the direction of higher expected fitness. We introduce a collection of techniques that address issues of convergence, robustness, sample complexity, computational complexity and sensitivity to hyperparameters. This paper explores a number of implementations of the NES family, ranging from general-purpose multi-variate normal distributions to heavy-tailed and separable distributions tailored towards global optimization and search in high dimensional spaces, respectively. Experimental results show best published performance on various standard benchmarks, as well as competitive performance on others.
△ Less
Submitted 22 June, 2011;
originally announced June 2011.