Search | arXiv e-print repository

Regularized Operator Extrapolation Method For Stochastic Bilevel Variational Inequality Problems

Authors: Mohammad Khalafi, Digvijay Boob

Abstract: The bilevel variational inequality (BVI) problem is a general model that captures various optimization problems, including VI-constrained optimization and equilibrium problems with equilibrium constraints (EPECs). This paper introduces a first-order method for smooth or nonsmooth BVI with stochastic monotone operators at inner and outer levels. Our novel method, called Regularized Operator Extra… ▽ More The bilevel variational inequality (BVI) problem is a general model that captures various optimization problems, including VI-constrained optimization and equilibrium problems with equilibrium constraints (EPECs). This paper introduces a first-order method for smooth or nonsmooth BVI with stochastic monotone operators at inner and outer levels. Our novel method, called Regularized Operator Extrapolation $(\texttt{R-OpEx})$, is a single-loop algorithm that combines Tikhonov's regularization with operator extrapolation. This method needs only one operator evaluation for each operator per iteration and tracks one sequence of iterates. We show that $\texttt{R-OpEx}$ gives $\mathcal{O}(ε^{-4})$ complexity in nonsmooth stochastic monotone BVI, where $ε$ is the error in the inner and outer levels. Using a mini-batching scheme, we improve the outer level complexity to $\mathcal{O}(ε^{-2})$ while maintaining the $\mathcal{O}(ε^{-4})$ complexity in the inner level when the inner level is smooth and stochastic. Moreover, if the inner level is smooth and deterministic, we show complexity of $\mathcal{O}(ε^{-2})$. Finally, in case the outer level is strongly monotone, we improve to $\mathcal{O}(ε^{-4/5})$ for general BVI and $\mathcal{O}(ε^{-2/3})$ when the inner level is smooth and deterministic. To our knowledge, this is the first work that investigates nonsmooth stochastic BVI with the best-known convergence guarantees. We verify our theoretical results with numerical experiments. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2410.18513 [pdf, ps, other]

Optimal Primal-Dual Algorithm with Last iterate Convergence Guarantees for Stochastic Convex Optimization Problems

Authors: Digvijay Boob, Mohammad Khalafi

Abstract: This paper proposes a novel first-order algorithm that solves composite nonsmooth and stochastic convex optimization problem with function constraints. Most of the works in the literature provide convergence rate guarantees on the average-iterate solution. There is growing interest in the convergence guarantees of the last iterate solution due to its favorable structural properties, such as sparsi… ▽ More This paper proposes a novel first-order algorithm that solves composite nonsmooth and stochastic convex optimization problem with function constraints. Most of the works in the literature provide convergence rate guarantees on the average-iterate solution. There is growing interest in the convergence guarantees of the last iterate solution due to its favorable structural properties, such as sparsity or privacy guarantees and good performance in practice. We provide the first method that obtains the best-known convergence rate guarantees on the last iterate for stochastic composite nonsmooth convex function-constrained optimization problems. Our novel and easy-to-implement algorithm is based on the augmented Lagrangian technique and uses a new linearized approximation of constraint functions, leading to its name, the Augmented Constraint Extrapolation (Aug-ConEx) method. We show that Aug-ConEx achieves $\mathcal{O}(1/\sqrt{K})$ convergence rate in the nonsmooth stochastic setting without any strong convexity assumption and $\mathcal{O}(1/K)$ for the same problem with strongly convex objective function. While optimal for nonsmooth and stochastic problems, the Aug-ConEx method also accelerates convergence in terms of Lipschitz smoothness constants to $\mathcal{O}(1/K)$ and $\mathcal{O}(1/K^2)$ in the aforementioned cases, respectively. To our best knowledge, this is the first method to obtain such differentiated convergence rate guarantees on the last iterate for a composite nonsmooth stochastic setting without additional $\log{K}$ factors. We validate the efficiency of our algorithm by comparing it with a state-of-the-art algorithm through numerical experiments. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 26 Pages, 3 Figures

arXiv:2304.04778 [pdf, ps, other]

First-order methods for Stochastic Variational Inequality problems with Function Constraints

Authors: Digvijay Boob, Qi Deng, Mohammad Khalafi

Abstract: The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (F… ▽ More The monotone Variational Inequality (VI) is a general model with important applications in various engineering and scientific domains. In numerous instances, the VI problems are accompanied by function constraints that can be data-driven, making the usual projection operator challenging to compute. This paper presents novel first-order methods for the function-constrained Variational Inequality (FCVI) problem in smooth or nonsmooth settings with possibly stochastic operators and constraints. We introduce the AdOpEx method, which employs an operator extrapolation on the KKT operator of the FCVI in a smooth deterministic setting. Since this operator is not uniformly Lipschitz continuous in the Lagrange multipliers, we employ an adaptive two-timescale algorithm leading to bounded multipliers and achieving the optimal $O(1/T)$ convergence rate. For the nonsmooth and stochastic VIs, we introduce design changes to the AdOpEx method and propose a novel P-OpEx method that takes partial extrapolation. It converges at the rate of $O(1/\sqrt{T})$ when both the operator and constraints are stochastic or nonsmooth. This method has suboptimal dependence on the noise and Lipschitz constants of function constraints. We propose a constraint extrapolation approach leading to the OpConEx method that improves this dependence by an order of magnitude. All our algorithms easily extend to saddle point problems with function constraints that couple the primal and dual variables while maintaining the same complexity results. To the best of our knowledge, all our complexity results are new in the literature △ Less

Submitted 24 May, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2209.04604 [pdf, ps, other]

Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems

Authors: Mohammad Khalafi, Digvijay Boob

Abstract: We investigate a primal-dual (PD) method for the saddle point problem (SPP) that uses a linear approximation of the primal function instead of the standard proximal step, resulting in a linearized PD (LPD) method. For convex-strongly concave SPP, we observe that the LPD method has a suboptimal dependence on the Lipschitz constant of the primal function. To fix this issue, we combine features of Ac… ▽ More We investigate a primal-dual (PD) method for the saddle point problem (SPP) that uses a linear approximation of the primal function instead of the standard proximal step, resulting in a linearized PD (LPD) method. For convex-strongly concave SPP, we observe that the LPD method has a suboptimal dependence on the Lipschitz constant of the primal function. To fix this issue, we combine features of Accelerated Gradient Descent with the LPD method resulting in a single-loop Accelerated Linearized Primal-Dual (ALPD) method. ALPD method achieves the optimal gradient complexity when the SPP has a semi-linear coupling function. We also present an inexact ALPD method for SPPs with a general nonlinear coupling function that maintains the optimal gradient evaluations of the primal parts and significantly improves the gradient evaluations of the coupling term compared to the ALPD method. We verify our findings with numerical experiments. △ Less

Submitted 18 May, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2205.08011 [pdf, other]

Level Constrained First Order Methods for Function Constrained Optimization

Authors: Digvijay Boob, Qi Deng, Guanghui Lan

Abstract: We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by the summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient value of th… ▽ More We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by the summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient value of the original objective and constraint functions. Either exact or approximate subproblem solutions can be computed efficiently in many cases. An important feature of the algorithm is the constraint level parameter. By carefully increasing this level for each subproblem, we provide a simple solution to overcome the challenge of bounding the Lagrangian multipliers and show that the algorithm follows a strictly feasible solution path till convergence to the stationary point. We develop a simple, proximal gradient descent type analysis, showing that the complexity bound of this new algorithm is comparable to gradient descent for the unconstrained setting, which is new in the literature. Exploiting this new design and analysis technique, we extend our algorithms to some more challenging constrained optimization problems where 1) the objective is a stochastic or finite-sum function, and 2) structured nonsmooth functions replace smooth components of both objective and constraint functions. Complexity results for these problems also seem to be new in the literature. Finally, our method can also be applied to convex function-constrained problems where we show complexities similar to the proximal gradient method. △ Less

Submitted 31 January, 2024; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: Accepted at Mathematical Programming

MSC Class: 90C26; 90C30; 90C06; 90C51; 49M37

arXiv:2104.02988 [pdf, ps, other]

Optimal Algorithms for Differentially Private Stochastic Monotone Variational Inequalities and Saddle-Point Problems

Authors: Digvijay Boob, Cristóbal Guzmán

Abstract: In this work, we conduct the first systematic study of stochastic variational inequality (SVI) and stochastic saddle point (SSP) problems under the constraint of differential privacy (DP). We propose two algorithms: Noisy Stochastic Extragradient (NSEG) and Noisy Inexact Stochastic Proximal Point (NISPP). We show that a stochastic approximation variant of these algorithms attains risk bounds vanis… ▽ More In this work, we conduct the first systematic study of stochastic variational inequality (SVI) and stochastic saddle point (SSP) problems under the constraint of differential privacy (DP). We propose two algorithms: Noisy Stochastic Extragradient (NSEG) and Noisy Inexact Stochastic Proximal Point (NISPP). We show that a stochastic approximation variant of these algorithms attains risk bounds vanishing as a function of the dataset size, with respect to the strong gap function; and a sampling with replacement variant achieves optimal risk bounds with respect to a weak gap function. We also show lower bounds of the same order on weak gap function. Hence, our algorithms are optimal. Key to our analysis is the investigation of algorithmic stability bounds, both of which are new even in the nonprivate case. The dependence of the running time of the sampling with replacement algorithms, with respect to the dataset size $n$, is $n^2$ for NSEG and $\tilde{O}(n^{3/2})$ for NISPP. △ Less

Submitted 1 April, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: A proof of stability and generalization in SVI/SSP has been found to contain a major bug. The result has been removed from the paper, and both upper and lower bounds for multipass algorithms have been established for a weak gap function. The results in this sense are still optimal

MSC Class: 65K10; 49M37; 68T09

arXiv:2010.12169 [pdf, other]

A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization

Authors: Digvijay Boob, Qi Deng, Guanghui Lan, Yilin Wang

Abstract: Nonconvex sparse models have received significant attention in high-dimensional machine learning. In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints. For this constrained model, we propose a novel proximal point algorithm that solves a sequence of convex subproblems with gradually relaxed co… ▽ More Nonconvex sparse models have received significant attention in high-dimensional machine learning. In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints. For this constrained model, we propose a novel proximal point algorithm that solves a sequence of convex subproblems with gradually relaxed constraint levels. Each subproblem, having a proximal point objective and a convex surrogate constraint, can be efficiently solved based on a fast routine for projection onto the surrogate constraint. We establish the asymptotic convergence of the proposed algorithm to the Karush-Kuhn-Tucker (KKT) solutions. We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems. We perform numerical experiments to demonstrate the effectiveness of our new model and efficiency of the proposed algorithm for large scale problems. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: Accepted at NeurIPS 2020

arXiv:1909.12387 [pdf, ps, other]

Faster width-dependent algorithm for mixed packing and covering LPs

Authors: Digvijay Boob, Saurabh Sawlani, Di Wang

Abstract: In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a $1+\eps$ approximate solution in time $O(Nw/ \eps)$, where $N$ is number of nonzero entries in the constraint matrix and $w$ is the maximum number of nonzeros in any cons… ▽ More In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a $1+\eps$ approximate solution in time $O(Nw/ \eps)$, where $N$ is number of nonzero entries in the constraint matrix and $w$ is the maximum number of nonzeros in any constraint. This run-time is better than Nesterov's smoothing algorithm which requires $O(N\sqrt{n}w/ \eps)$ where $n$ is the dimension of the problem. Our work utilizes the framework of area convexity introduced in [Sherman-FOCS'17] to obtain the best dependence on $\eps$ while breaking the infamous $\ell_{\infty}$ barrier to eliminate the factor of $\sqrt{n}$. The current best width-independent algorithm for this problem runs in time $O(N/\eps^2)$ [Young-arXiv-14] and hence has worse running time dependence on $\eps$. Many real life instances of the mixed packing-covering problems exhibit small width and for such cases, our algorithm can report higher precision results when compared to width-independent algorithms. As a special case of our result, we report a $1+\eps$ approximation algorithm for the densest subgraph problem which runs in time $O(md/ \eps)$, where $m$ is the number of edges in the graph and $d$ is the maximum graph degree. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Accepted for oral presentation at NeurIPS 2019

arXiv:1908.02734 [pdf, ps, other]

Stochastic First-order Methods for Convex and Nonconvex Functional Constrained Optimization

Authors: Digvijay Boob, Qi Deng, Guanghui Lan

Abstract: Functional constrained optimization is becoming more and more important in machine learning and operations research. Such problems have potential applications in risk-averse machine learning, semisupervised learning, and robust optimization among others. In this paper, we first present a novel Constraint Extrapolation (ConEx) method for solving convex functional constrained problems, which utilize… ▽ More Functional constrained optimization is becoming more and more important in machine learning and operations research. Such problems have potential applications in risk-averse machine learning, semisupervised learning, and robust optimization among others. In this paper, we first present a novel Constraint Extrapolation (ConEx) method for solving convex functional constrained problems, which utilizes linear approximations of the constraint functions to define the extrapolation (or acceleration) step. We show that this method is a unified algorithm that achieves the best-known rate of convergence for solving different functional constrained convex composite problems, including convex or strongly convex, and smooth or nonsmooth problems with a stochastic objective and/or stochastic constraints. Many of these rates of convergence were in fact obtained for the first time in the literature. In addition, ConEx is a single-loop algorithm that does not involve any penalty subproblems. Contrary to existing primal-dual methods, it does not require the projection of Lagrangian multipliers into a (possibly unknown) bounded set. Second, for nonconvex functional constrained problems, we introduce a new proximal point method that transforms the initial nonconvex problem into a sequence of convex problems by adding quadratic terms to both the objective and constraints. Under a certain MFCQ-type assumption, we establish the convergence and rate of convergence of this method to KKT points when the convex subproblems are solved exactly or inexactly. For large-scale and stochastic problems, we present a more practical proximal point method in which the approximate solutions of the subproblems are computed by the aforementioned ConEx method. To the best of our knowledge, most of these convergence and complexity results of the proximal point method for nonconvex problems also seem to be new in the literature. △ Less

Submitted 26 January, 2022; v1 submitted 7 August, 2019; originally announced August 2019.

Comments: 36 pages, final version, accepted at Math Programming

Showing 1–9 of 9 results for author: Boob, D