-
Modified K-means Algorithm with Local Optimality Guarantees
Authors:
Mingyi Li,
Michael R. Metel,
Akiko Takeda
Abstract:
The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its local optimality guarantees. In this paper, we first present conditions under which the K-means algorithm converges to a locally optimal solution. Based on this, we…
▽ More
The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its local optimality guarantees. In this paper, we first present conditions under which the K-means algorithm converges to a locally optimal solution. Based on this, we propose simple modifications to the K-means algorithm which ensure local optimality in both the continuous and discrete sense, with the same computational complexity as the original K-means algorithm. As the dissimilarity measure, we consider a general Bregman divergence, which is an extension of the squared Euclidean distance often used in the K-means algorithm. Numerical experiments confirm that the K-means algorithm does not always find a locally optimal solution in practice, while our proposed methods provide improved locally optimal solutions with reduced clustering loss. Our code is available at https://github.com/lmingyi/LO-K-means.
△ Less
Submitted 11 June, 2025; v1 submitted 8 June, 2025;
originally announced June 2025.
-
Mathematical Challenges in Deep Learning
Authors:
Vahid Partovi Nia,
Guojun Zhang,
Ivan Kobyzev,
Michael R. Metel,
Xinlin Li,
Ke Sun,
Sobhan Hemati,
Masoud Asgharian,
Linglong Kong,
Wulong Liu,
Boxing Chen
Abstract:
Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimizati…
▽ More
Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments
Authors:
Michael R. Metel
Abstract:
Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary poin…
▽ More
Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point. It is assumed that only an approximation of the loss function's stochastic gradient can be computed in addition to error in computing the SGD step itself. Different variants of SGD are tested empirically, where improved test set accuracy is observed compared to SGD for two image recognition tasks.
△ Less
Submitted 24 April, 2024; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint
Authors:
Michael R. Metel
Abstract:
This paper is motivated by structured sparsity for deep neural network training. We study a weighted group L0-norm constraint, and present the projection and normal cone of this set. Using randomized smoothing, we develop zeroth and first-order algorithms for minimizing a Lipschitz continuous function constrained by any closed set which can be projected onto. Non-asymptotic convergence guarantees…
▽ More
This paper is motivated by structured sparsity for deep neural network training. We study a weighted group L0-norm constraint, and present the projection and normal cone of this set. Using randomized smoothing, we develop zeroth and first-order algorithms for minimizing a Lipschitz continuous function constrained by any closed set which can be projected onto. Non-asymptotic convergence guarantees are proven in expectation for the proposed algorithms for two related convergence criteria which can be considered as approximate stationary points. Two further methods are given using the proposed algorithms: one with non-asymptotic convergence guarantees in high probability, and the other with asymptotic guarantees to a stationary point almost surely. We believe in particular that these are the first such non-asymptotic convergence results for constrained Lipschitz continuous loss functions.
△ Less
Submitted 20 December, 2022; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Primal-dual subgradient method for constrained convex optimization problems
Authors:
Michael R. Metel,
Akiko Takeda
Abstract:
This paper considers a general convex constrained problem setting where functions are not assumed to be differentiable nor Lipschitz continuous. Our motivation is in finding a simple first-order method for solving a wide range of convex optimization problems with minimal requirements. We study the method of weighted dual averages (Nesterov, 2009) in this setting and prove that it is an optimal met…
▽ More
This paper considers a general convex constrained problem setting where functions are not assumed to be differentiable nor Lipschitz continuous. Our motivation is in finding a simple first-order method for solving a wide range of convex optimization problems with minimal requirements. We study the method of weighted dual averages (Nesterov, 2009) in this setting and prove that it is an optimal method.
△ Less
Submitted 18 March, 2021; v1 submitted 27 September, 2020;
originally announced September 2020.
-
Perturbed Iterate SGD for Lipschitz Continuous Loss Functions
Authors:
Michael R. Metel,
Akiko Takeda
Abstract:
This paper presents an extension of stochastic gradient descent for the minimization of Lipschitz continuous loss functions. Our motivation is for use in non-smooth non-convex stochastic optimization problems, which are frequently encountered in applications such as machine learning. Using the Clarke $ε$-subdifferential, we prove the non-asymptotic convergence to an approximate stationary point in…
▽ More
This paper presents an extension of stochastic gradient descent for the minimization of Lipschitz continuous loss functions. Our motivation is for use in non-smooth non-convex stochastic optimization problems, which are frequently encountered in applications such as machine learning. Using the Clarke $ε$-subdifferential, we prove the non-asymptotic convergence to an approximate stationary point in expectation for the proposed method. From this result, a method with non-asymptotic convergence with high probability, as well as a method with asymptotic convergence to a Clarke stationary point almost surely are developed. Our results hold under the assumption that the stochastic loss function is a Carathéodory function which is almost everywhere Lipschitz continuous in the decision variables. To the best of our knowledge this is the first non-asymptotic convergence analysis under these minimal assumptions.
△ Less
Submitted 3 October, 2022; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Stochastic Proximal Methods for Non-Smooth Non-Convex Constrained Sparse Optimization
Authors:
Michael R. Metel,
Akiko Takeda
Abstract:
This paper focuses on stochastic proximal gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer and convex constraints. To the best of our knowledge we present the first non-asymptotic convergence results for this class of problem. We present two simple stochastic proximal gradient algorithms, for general stochastic and finite-sum optimization p…
▽ More
This paper focuses on stochastic proximal gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer and convex constraints. To the best of our knowledge we present the first non-asymptotic convergence results for this class of problem. We present two simple stochastic proximal gradient algorithms, for general stochastic and finite-sum optimization problems, which have the same or superior convergence complexities compared to the current best results for the unconstrained problem setting. In a numerical experiment we compare our algorithms with the current state-of-the-art deterministic algorithm and find our algorithms to exhibit superior convergence.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization
Authors:
Michael R. Metel,
Akiko Takeda
Abstract:
Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence results have been reported. We present two simple stochastic gradient algorithms, for finite-sum and general stochastic optimization problems, which have superi…
▽ More
Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence results have been reported. We present two simple stochastic gradient algorithms, for finite-sum and general stochastic optimization problems, which have superior convergence complexities compared to the current state-of-the-art. We also compare our algorithms' performance in practice for empirical risk minimization.
△ Less
Submitted 14 May, 2019; v1 submitted 24 January, 2019;
originally announced January 2019.
-
Charging station optimization for balanced electric car sharing
Authors:
Antoine Deza,
Kai Huang,
Michael R. Metel
Abstract:
This work focuses on finding optimal locations for charging stations for one-way electric car sharing programs. The relocation of vehicles by a service staff is generally required in vehicle sharing programs in order to correct imbalances in the network. We seek to limit the need for vehicle relocation by strategically locating charging stations given estimates of traffic flow. A mixed-integer lin…
▽ More
This work focuses on finding optimal locations for charging stations for one-way electric car sharing programs. The relocation of vehicles by a service staff is generally required in vehicle sharing programs in order to correct imbalances in the network. We seek to limit the need for vehicle relocation by strategically locating charging stations given estimates of traffic flow. A mixed-integer linear programming formulation is presented with a large number of potential charging station locations. A column generation approach is used which finds an optimal set of locations for the continuous relaxation of our problem. Results of a numerical experiment using real traffic and geographic information system location data show that our formulation significantly increases the balanced flow across the network, while our column generation technique was found to produce a superior solution in much shorter computation time compared to solving the original formulation with all possible station locations.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Mini-batch stochastic gradient descent with dynamic sample sizes
Authors:
Michael R. Metel
Abstract:
We focus on solving constrained convex optimization problems using mini-batch stochastic gradient descent. Dynamic sample size rules are presented which ensure a descent direction with high probability. Empirical results from two applications show superior convergence compared to fixed sample implementations.
We focus on solving constrained convex optimization problems using mini-batch stochastic gradient descent. Dynamic sample size rules are presented which ensure a descent direction with high probability. Empirical results from two applications show superior convergence compared to fixed sample implementations.
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
Kelly betting on horse races with uncertainty in probability estimates
Authors:
Michael R. Metel
Abstract:
We investigate the problem of gambling with uncertainty in outcome probabilities. Stochastic optimization models are proposed for optimal investing on events with mutually exclusive outcomes when probabilities are estimated using multinomial logistic regression. Special attention is given to the case of there being two outcomes, and the general case of many outcomes. An empirical study using simul…
▽ More
We investigate the problem of gambling with uncertainty in outcome probabilities. Stochastic optimization models are proposed for optimal investing on events with mutually exclusive outcomes when probabilities are estimated using multinomial logistic regression. Special attention is given to the case of there being two outcomes, and the general case of many outcomes. An empirical study using simulated data was conducted where the loss of return from probability estimation error is observed, and superior returns are achieved taking it into consideration.
△ Less
Submitted 1 August, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Imperfect demand estimation for new product production planning
Authors:
Antoine Deza,
Kai Huang,
Michael R. Metel
Abstract:
We are interested in the effect of consumer demand estimation error for new products in the context of production planning. An inventory model is proposed, whereby demand is influenced by price and advertising. The effect of parameter misspecification of the demand model is empirically examined in relation to profit and service level feasibility. Faced with an uncertain consumer reaction to price…
▽ More
We are interested in the effect of consumer demand estimation error for new products in the context of production planning. An inventory model is proposed, whereby demand is influenced by price and advertising. The effect of parameter misspecification of the demand model is empirically examined in relation to profit and service level feasibility. Faced with an uncertain consumer reaction to price and advertising, we find that it is safer to overestimate rather than underestimate the effect of price on demand. Moreover, under a service level constraint it is safer to overestimate the effect of advertising, whereas for strict profit maximization, underestimating the effect of advertising is the conservative approach.
△ Less
Submitted 22 October, 2015;
originally announced October 2015.
-
Risk management under Omega measure
Authors:
Michael R. Metel,
Traian A. Pirvu,
Julian Wong
Abstract:
We prove that the Omega measure, which considers all moments when assessing portfolio performance, is equivalent to the widely used Sharpe ratio under jointly elliptic distributions of returns. Portfolio optimization of the Sharpe ratio is then explored, with an active-set algorithm presented for markets prohibiting short sales. When asymmetric returns are considered we show that the Omega measure…
▽ More
We prove that the Omega measure, which considers all moments when assessing portfolio performance, is equivalent to the widely used Sharpe ratio under jointly elliptic distributions of returns. Portfolio optimization of the Sharpe ratio is then explored, with an active-set algorithm presented for markets prohibiting short sales. When asymmetric returns are considered we show that the Omega measure and Sharpe ratio lead to different optimal portfolios.
△ Less
Submitted 11 April, 2017; v1 submitted 20 October, 2015;
originally announced October 2015.
-
Managing losses in exotic horse race wagering
Authors:
Antoine Deza,
Kai Huang,
Michael R. Metel
Abstract:
We consider a specialized form of risk management for betting opportunities with low payout frequency, presented in particular for exotic horse race wagering. An optimization problem is developed which limits losing streaks with high probability to the given time horizon of a gambler, which is formulated as a globally solvable mixed integer non-linear program. A case study is conducted using one s…
▽ More
We consider a specialized form of risk management for betting opportunities with low payout frequency, presented in particular for exotic horse race wagering. An optimization problem is developed which limits losing streaks with high probability to the given time horizon of a gambler, which is formulated as a globally solvable mixed integer non-linear program. A case study is conducted using one season of historical horse racing data.
△ Less
Submitted 1 August, 2017; v1 submitted 23 March, 2015;
originally announced March 2015.
-
Chance Constrained Optimization for Targeted Internet Advertising
Authors:
Antoine Deza,
Kai Huang,
Michael R. Metel
Abstract:
We introduce a chance constrained optimization model for the fulfillment of guaranteed display Internet advertising campaigns. The proposed formulation for the allocation of display inventory takes into account the uncertainty of the supply of Internet viewers. We discuss and present theoretical and computational features of the model via Monte Carlo sampling and convex approximations. Theoretical…
▽ More
We introduce a chance constrained optimization model for the fulfillment of guaranteed display Internet advertising campaigns. The proposed formulation for the allocation of display inventory takes into account the uncertainty of the supply of Internet viewers. We discuss and present theoretical and computational features of the model via Monte Carlo sampling and convex approximations. Theoretical upper and lower bounds are presented along with a numerical substantiation.
△ Less
Submitted 29 July, 2014;
originally announced July 2014.