-
Optimal Tracking in Prediction with Expert Advice
Authors:
Hakan Gokcesu,
Suleyman S. Kozat
Abstract:
We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts, e.g., independently running algorithms. We achieve the min-max optimal dynamic regret under the prediction with expert advice setting, i.e., we can compete against time-varying (not necessarily fixed) combinations of expert decisions in an optimal man…
▽ More
We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts, e.g., independently running algorithms. We achieve the min-max optimal dynamic regret under the prediction with expert advice setting, i.e., we can compete against time-varying (not necessarily fixed) combinations of expert decisions in an optimal manner. Our end-algorithm is truly online with no prior information, such as the time horizon or loss ranges, which are commonly used by different algorithms in the literature. Both our regret guarantees and the min-max lower bounds are derived with the general consideration that the expert losses can have time-varying properties and are possibly unbounded. Our algorithm can be adapted for restrictive scenarios regarding both loss feedback and decision making. Our guarantees are universal, i.e., our end-algorithm can provide regret guarantee against any competitor sequence in a min-max optimal manner with logarithmic complexity. Note that, to our knowledge, for the prediction with expert advice problem, our algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret
Authors:
Hakan Gokcesu,
Suleyman S. Kozat
Abstract:
We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of squared subgradient norms) for a sequence of general convex functions, which may not have strong convexity, smoothness, exp-concavity or even Lipschitz-continui…
▽ More
We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of squared subgradient norms) for a sequence of general convex functions, which may not have strong convexity, smoothness, exp-concavity or even Lipschitz-continuity. The regret guarantee is against any comparator decision sequence with bounded path variation (i.e. sum of the distances between successive decisions). We generate the lower bound of the worst-case second-order dynamic regret by incorporating actual subgradient norms. We show that this lower bound matches with our regret guarantee within a constant factor, which makes our algorithm minimax optimal. We also derive the extension for learning in each decision coordinate individually. We demonstrate how to best preserve our regret guarantee in a truly online manner, when the bound on path variation of the comparator sequence grows in time or the feedback regarding such bound arrives partially as time goes on. We further build on our algorithm to eliminate the need of any knowledge on the comparator path variation, and provide minimax optimal second-order regret guarantees with no a priori information. Our approach can compete against all comparator sequences simultaneously (universally) in a minimax optimal manner, i.e. each regret guarantee depends on the respective comparator path variation. We discuss modifications to our approach which address complexity reductions for time, computation and memory. We further improve our results by making the regret guarantees also dependent on comparator sets' diameters in addition to the respective path variations.
△ Less
Submitted 13 September, 2022; v1 submitted 30 June, 2019;
originally announced July 2019.
-
Accelerating Min-Max Optimization with Application to Minimal Bounding Sphere
Authors:
Hakan Gokcesu,
Kaan Gokcesu,
Suleyman Serdar Kozat
Abstract:
We study the min-max optimization problem where each function contributing to the max operation is strongly-convex and smooth with bounded gradient in the search domain. By smoothing the max operator, we show the ability to achieve an arbitrarily small positive optimality gap of $δ$ in $\tilde{O}(1/\sqrtδ)$ computational complexity (up to logarithmic factors) as opposed to the state-of-the-art str…
▽ More
We study the min-max optimization problem where each function contributing to the max operation is strongly-convex and smooth with bounded gradient in the search domain. By smoothing the max operator, we show the ability to achieve an arbitrarily small positive optimality gap of $δ$ in $\tilde{O}(1/\sqrtδ)$ computational complexity (up to logarithmic factors) as opposed to the state-of-the-art strong-convexity computational requirement of $O(1/δ)$. We apply this important result to the well-known minimal bounding sphere problem and demonstrate that we can achieve a $(1+\varepsilon)$-approximation of the minimal bounding sphere, i.e. identify an hypersphere enclosing a total of $n$ given points in the $d$ dimensional unbounded space $\mathbb{R}^d$ with a radius at most $(1+\varepsilon)$ times the actual minimal bounding sphere radius for an arbitrarily small positive $\varepsilon$, in $\tilde{O}(n d /\sqrt{\varepsilon})$ computational time as opposed to the state-of-the-art approach of core-set methodology, which needs $O(n d /\varepsilon)$ computational time.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Minimax Optimal Online Stochastic Learning for Sequences of Convex Functions under Sub-Gradient Observation Failures
Authors:
Hakan Gokcesu,
Suleyman S. Kozat
Abstract:
We study online convex optimization under stochastic sub-gradient observation faults, where we introduce adaptive algorithms with minimax optimal regret guarantees. We specifically study scenarios where our sub-gradient observations can be noisy or even completely missing in a stochastic manner. To this end, we propose algorithms based on sub-gradient descent method, which achieve tight minimax op…
▽ More
We study online convex optimization under stochastic sub-gradient observation faults, where we introduce adaptive algorithms with minimax optimal regret guarantees. We specifically study scenarios where our sub-gradient observations can be noisy or even completely missing in a stochastic manner. To this end, we propose algorithms based on sub-gradient descent method, which achieve tight minimax optimal regret bounds. When necessary, these algorithms utilize properties of the underlying stochastic settings to optimize their learning rates (step sizes). These optimizations are the main factor in providing the minimax optimal performance guarantees, especially when observations are stochastically missing. However, in real world scenarios, these properties of the underlying stochastic settings may not be revealed to the optimizer. For such a scenario, we propose a blind algorithm that estimates these properties empirically in a generally applicable manner. Through extensive experiments, we show that this empirical approach is a natural combination of regular stochastic gradient descent and the minimax optimal algorithms (which work best for randomized and adversarial function sequences, respectively).
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
Efficient Implementation Of Newton-Raphson Methods For Sequential Data Prediction
Authors:
Burak C. Civek,
Suleyman S. Kozat
Abstract:
We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data predictor much faster compared to the first order algorithms, e.g., Online Gradient Descent. However, implementation of these methods is not usually feasible in big d…
▽ More
We investigate the problem of sequential linear data prediction for real life big data applications. The second order algorithms, i.e., Newton-Raphson Methods, asymptotically achieve the performance of the "best" possible linear data predictor much faster compared to the first order algorithms, e.g., Online Gradient Descent. However, implementation of these methods is not usually feasible in big data applications because of the extremely high computational needs. Regular implementation of the Newton-Raphson Methods requires a computational complexity in the order of $O(M^2)$ for an $M$ dimensional feature vector, while the first order algorithms need only $O(M)$. To this end, in order to eliminate this gap, we introduce a highly efficient implementation reducing the computational complexity of the Newton-Raphson Methods from quadratic to linear scale. The presented algorithm provides the well-known merits of the second order methods while offering the computational complexity of $O(M)$. We utilize the shifted nature of the consecutive feature vectors and do not rely on any statistical assumptions. Therefore, both regular and fast implementations achieve the same performance in the sense of mean square error. We demonstrate the computational efficiency of our algorithm on real life sequential big datasets. We also illustrate that the presented algorithm is numerically stable.
△ Less
Submitted 19 January, 2017;
originally announced January 2017.
-
A Novel Family of Boosted Online Regression Algorithms with Strong Theoretical Bounds
Authors:
Dariush Kari,
Farhan Khan,
Selami Ciftci,
Suleyman Serdar Kozat
Abstract:
We investigate boosted online regression and propose a novel family of regression algorithms with strong theoretical bounds. In addition, we implement several variants of the proposed generic algorithm. We specifically provide theoretical bounds for the performance of our proposed algorithms that hold in a strong mathematical sense. We achieve guaranteed performance improvement over the convention…
▽ More
We investigate boosted online regression and propose a novel family of regression algorithms with strong theoretical bounds. In addition, we implement several variants of the proposed generic algorithm. We specifically provide theoretical bounds for the performance of our proposed algorithms that hold in a strong mathematical sense. We achieve guaranteed performance improvement over the conventional online regression methods without any statistical assumptions on the desired data or feature vectors. We demonstrate an intrinsic relationship, in terms of boosting, between the adaptive mixture-of-experts and data reuse algorithms. Furthermore, we introduce a boosting algorithm based on random updates that is significantly faster than the conventional boosting methods and other variants of our proposed algorithms while achieving an enhanced performance gain. Hence, the random updates method is specifically applicable to the fast and high dimensional streaming data. Specifically, we investigate Newton Method-based and Stochastic Gradient Descent-based linear regression algorithms in a mixture-of-experts setting and provide several variants of these well-known adaptation methods. However, the proposed algorithms can be extended to other base learners, e.g., nonlinear, tree-based piecewise linear. Furthermore, we provide theoretical bounds for the computational complexity of our proposed algorithms. We demonstrate substantial performance gains in terms of mean square error over the base learners through an extensive set of benchmark real data sets and simulated examples.
△ Less
Submitted 6 December, 2016; v1 submitted 4 January, 2016;
originally announced January 2016.
-
Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks
Authors:
N. Denizcan Vanli,
Muhammed O. Sayin,
Suleyman S. Kozat
Abstract:
We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on th…
▽ More
We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on the stochastic gradient descent (SGD) updates. Particularly, we use a carefully designed time-dependent weighted averaging of the SGD iterates, which yields a convergence rate of $O\left(\frac{N\sqrt{N}}{T}\right)$ after $T$ gradient updates for each node on a network of $N$ nodes. We then show that after $T$ gradient oracle calls, the average SGD iterate achieves a mean square deviation (MSD) of $O\left(\frac{\sqrt{N}}{T}\right)$. This rate of convergence is optimal as it matches the performance lower bound up to constant terms. Similar to the SGD algorithm, the computational complexity of the proposed algorithm also scales linearly with the dimensionality of the data. Furthermore, the communication load of the proposed method is the same as the communication load of the SGD algorithm. Thus, the proposed algorithm is highly efficient in terms of complexity and communication load. We illustrate the merits of the algorithm with respect to the state-of-art methods over benchmark real life data sets and widely studied network topologies.
△ Less
Submitted 31 August, 2015; v1 submitted 29 September, 2014;
originally announced September 2014.