-
Non-Convex Bilevel Optimization with Time-Varying Objective Functions
Authors:
Sen Lin,
Daouda Sow,
Kaiyi Ji,
Yingbin Liang,
Ness Shroff
Abstract:
Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying…
▽ More
Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying and the agent continuously updates the decisions with online streaming data. To deal with the function variations and the unavailability of the true hypergradients in OBO, we propose a single-loop online bilevel optimizer with window averaging (SOBOW), which updates the outer-level decision based on a window average of the most recent hypergradient estimations stored in the memory. Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions. To handle the unique technical difficulties rooted in single-loop update and function variations for OBO, we develop a novel analytical technique that disentangles the complex couplings between decision variables, and carefully controls the hypergradient estimation error. We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions. Extensive experiments across multiple domains corroborate the effectiveness of SOBOW.
△ Less
Submitted 8 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
A Primal-Dual Approach to Bilevel Optimization with Multiple Inner Minima
Authors:
Daouda Sow,
Kaiyi Ji,
Ziwei Guan,
Yingbin Liang
Abstract:
Bilevel optimization has found extensive applications in modern machine learning problems such as hyperparameter optimization, neural architecture search, meta-learning, etc. While bilevel problems with a unique inner minimal point (e.g., where the inner function is strongly convex) are well understood, such a problem with multiple inner minimal points remains to be challenging and open. Existing…
▽ More
Bilevel optimization has found extensive applications in modern machine learning problems such as hyperparameter optimization, neural architecture search, meta-learning, etc. While bilevel problems with a unique inner minimal point (e.g., where the inner function is strongly convex) are well understood, such a problem with multiple inner minimal points remains to be challenging and open. Existing algorithms designed for such a problem were applicable to restricted situations and do not come with a full guarantee of convergence. In this paper, we adopt a reformulation of bilevel optimization to constrained optimization, and solve the problem via a primal-dual bilevel optimization (PDBO) algorithm. PDBO not only addresses the multiple inner minima challenge, but also features fully first-order efficiency without involving second-order Hessian and Jacobian computations, as opposed to most existing gradient-based bilevel algorithms. We further characterize the convergence rate of PDBO, which serves as the first known non-asymptotic convergence guarantee for bilevel optimization with multiple inner minima. Our experiments demonstrate desired performance of the proposed approach.
△ Less
Submitted 8 June, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
On the Convergence Theory for Hessian-Free Bilevel Algorithms
Authors:
Daouda Sow,
Kaiyi Ji,
Yingbin Liang
Abstract:
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where…
▽ More
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where the general idea is to use zeroth- or first-order methods to approximate the full hypergradient of the bilevel problem. However, we empirically observe that such approximation can lead to large variance and unstable training, but estimating only the response Jacobian matrix as a partial component of the hypergradient turns out to be extremely effective. To this end, we propose a new Hessian-free method, which adopts the zeroth-order-like method to approximate the response Jacobian matrix via taking difference between two optimization paths. Theoretically, we provide the convergence rate analysis for the proposed algorithms, where our key challenge is to characterize the approximation and smoothness properties of the trajectory-dependent estimator, which can be of independent interest. This is the first known convergence rate result for this type of Hessian-free bilevel algorithms. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
△ Less
Submitted 6 June, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Pairings on Generalized Huff Curves
Authors:
Abdoul Aziz Ciss,
Djiby Sow
Abstract:
This paper presents the Tate pairing computation on generalized Huff curves proposed by Wu and Feng. In fact, we extend the results of the Tate pairing computation on the standard Huff elliptic curves done previously by Joye, Tibouchi and Vergnaud. We show that the addition step of the Miller loop can be performed in $1\mathbf{M}+(k+15)\mathbf{m}+2\mathbf{c}$ and the doubling one in…
▽ More
This paper presents the Tate pairing computation on generalized Huff curves proposed by Wu and Feng. In fact, we extend the results of the Tate pairing computation on the standard Huff elliptic curves done previously by Joye, Tibouchi and Vergnaud. We show that the addition step of the Miller loop can be performed in $1\mathbf{M}+(k+15)\mathbf{m}+2\mathbf{c}$ and the doubling one in $1\mathbf{M} + 1\mathbf{S} + (k + 12) \mathbf{m} + 5\mathbf{s} + 2\mathbf{c}$ on the generalized Huff curve.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.
-
Noncommutative Gröbner bases over rings
Authors:
André Mialebama Bouesso,
Djiby Sow
Abstract:
In this work, it is proposed a method for computing Noncommutative Gröbner bases over a valuation nœtherian ring. We have generalized the fundamental theorem on normal forms over an arbitrary ring. The classical method of dynamical commutative Gröbner bases is generalized for Buchberger's algorithm over $R=\mathcal{V}<x_1,...,x_m>$ a free associative algebra with non-commuting variables, where…
▽ More
In this work, it is proposed a method for computing Noncommutative Gröbner bases over a valuation nœtherian ring. We have generalized the fundamental theorem on normal forms over an arbitrary ring. The classical method of dynamical commutative Gröbner bases is generalized for Buchberger's algorithm over $R=\mathcal{V}<x_1,...,x_m>$ a free associative algebra with non-commuting variables, where $\mathcal{V}=\mathbb{Z}/n\mathbb{Z}$ or $\mathcal{V}=\mathbb{Z}$.
The process proposed, generalizes previous known technics for the computation of Commutative Gröbner bases over a valuation nœtherian ring and/or Noncommutative Gröbner bases over a field.
△ Less
Submitted 12 August, 2012;
originally announced August 2012.
-
A Factoring and Discrete Logarithm based Cryptosystem
Authors:
Abdoul Aziz Ciss,
Ahmed Youssef Ould Cheikh,
Djiby Sow
Abstract:
This paper introduces a new public key cryptosystem based on two hard problems : the cube root extraction modulo a composite moduli (which is equivalent to the factorisation of the moduli) and the discrete logarithm problem. These two hard problems are combined during the key generation, encryption and decryption phases. By combining the IFP and the DLP we introduce a secure and efficient public k…
▽ More
This paper introduces a new public key cryptosystem based on two hard problems : the cube root extraction modulo a composite moduli (which is equivalent to the factorisation of the moduli) and the discrete logarithm problem. These two hard problems are combined during the key generation, encryption and decryption phases. By combining the IFP and the DLP we introduce a secure and efficient public key cryptosystem. To break the scheme, an adversary may solve the IFP and the DLP separately which is computationally infeasible. The key generation is a simple operation based on the discrete logarithm modulo a composite moduli. The encryption phase is based both on the cube root computation and the DLP. These operations are computationally efficient.
△ Less
Submitted 23 September, 2012; v1 submitted 6 May, 2012;
originally announced May 2012.