-
Swapping objectives accelerates Davis-Yin splitting
Authors:
Edward Duc Hien Nguyen,
Jaewook J. Suh,
Xin Jiang,
Shiqian Ma
Abstract:
In this work, we investigate the application of Davis-Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or often overlooked. We address this gap by…
▽ More
In this work, we investigate the application of Davis-Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or often overlooked. We address this gap by establishing best-known convergence rates for DYS and its swapped counterpart, using the primal--dual gap function as the performance metric. Our results indicate that variants of the Douglas--Rachford splitting algorithm (a special case of DYS) share the same worst-case rate, whereas the convergence rates of the two DYS variants differ. This discrepancy is further illustrated through concrete examples.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
An Adaptive and Parameter-Free Nesterov's Accelerated Gradient Method for Convex Optimization
Authors:
Jaewook J. Suh,
Shiqian Ma
Abstract:
We propose AdaNAG, an adaptive accelerated gradient method based on Nesterov's accelerated gradient method. AdaNAG is line-search-free, parameter-free, and achieves the accelerated convergence rates $f(x_k) - f_\star = \mathcal{O}\left(1/k^2\right)$ and $\min_{i\in\left\{1,\dots, k\right\}} \|\nabla f(x_i)\|^2 = \mathcal{O}\left(1/k^3\right)$ for $L$-smooth convex function $f$. We provide a Lyapun…
▽ More
We propose AdaNAG, an adaptive accelerated gradient method based on Nesterov's accelerated gradient method. AdaNAG is line-search-free, parameter-free, and achieves the accelerated convergence rates $f(x_k) - f_\star = \mathcal{O}\left(1/k^2\right)$ and $\min_{i\in\left\{1,\dots, k\right\}} \|\nabla f(x_i)\|^2 = \mathcal{O}\left(1/k^3\right)$ for $L$-smooth convex function $f$. We provide a Lyapunov analysis for the convergence proof of AdaNAG, which additionally enables us to propose a novel adaptive gradient descent (GD) method, AdaGD. AdaGD achieves the non-ergodic convergence rate $f(x_k) - f_\star = \mathcal{O}\left(1/k\right)$, like the original GD. The analysis of AdaGD also motivated us to propose a generalized AdaNAG that includes practically useful variants of AdaNAG. Numerical results demonstrate that our methods outperform some other recent adaptive methods for representative applications.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models
Authors:
Jaesung R. Park,
Jaewook J. Suh,
Youngjoon Hong,
Ernest K. Ryu
Abstract:
In deep learning, the recently introduced state space models utilize HiPPO (High-order Polynomial Projection Operators) memory units to approximate continuous-time trajectories of input functions using ordinary differential equations (ODEs), and these techniques have shown empirical success in capturing long-range dependencies in long input sequences. However, the mathematical foundations of these…
▽ More
In deep learning, the recently introduced state space models utilize HiPPO (High-order Polynomial Projection Operators) memory units to approximate continuous-time trajectories of input functions using ordinary differential equations (ODEs), and these techniques have shown empirical success in capturing long-range dependencies in long input sequences. However, the mathematical foundations of these ODEs, particularly the singular HiPPO-LegS (Legendre Scaled) ODE, and their corresponding numerical discretizations remain unsettled. In this work, we fill this gap by establishing that HiPPO-LegS ODE is well-posed despite its singularity, albeit without the freedom of arbitrary initial conditions. Further, we establish convergence of the associated numerical discretization schemes for Riemann integrable input functions.
△ Less
Submitted 8 June, 2025; v1 submitted 11 December, 2024;
originally announced December 2024.
-
Optimization Algorithm Design via Electric Circuits
Authors:
Stephen P. Boyd,
Tetiana Parshakova,
Ernest K. Ryu,
Jaewook J. Suh
Abstract:
We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the cont…
▽ More
We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the continuous-time dynamics, yielding a provably convergent discrete-time algorithm. Our methodology recovers many classical (distributed) optimization algorithms and enables users to quickly design and explore a wide range of new algorithms with convergence guarantees.
△ Less
Submitted 20 January, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique
Authors:
TaeHo Yoon,
Jaeyeon Kim,
Jaewook J. Suh,
Ernest K. Ryu
Abstract:
Recently, accelerated algorithms using the anchoring mechanism for minimax optimization and fixed-point problems have been proposed, and matching complexity lower bounds establish their optimality. In this work, we present the surprising observation that the optimal acceleration mechanism in minimax optimization and fixed-point problems is not unique. Our new algorithms achieve exactly the same wo…
▽ More
Recently, accelerated algorithms using the anchoring mechanism for minimax optimization and fixed-point problems have been proposed, and matching complexity lower bounds establish their optimality. In this work, we present the surprising observation that the optimal acceleration mechanism in minimax optimization and fixed-point problems is not unique. Our new algorithms achieve exactly the same worst-case convergence rates as existing anchor-based methods while using materially different acceleration mechanisms. Specifically, these new algorithms are dual to the prior anchor-based accelerated methods in the sense of H-duality. This finding opens a new avenue of research on accelerated algorithms since we now have a family of methods that empirically exhibit varied characteristics while having the same optimal worst-case guarantee.
△ Less
Submitted 23 April, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Continuous-time Analysis of Anchor Acceleration
Authors:
Jaewook J. Suh,
Jisun Park,
Ernest K. Ryu
Abstract:
Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a functio…
▽ More
Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a function of the anchor coefficient $β(t)$, thereby providing insight into the anchor acceleration mechanism and its accelerated $\mathcal{O}(1/k^2)$-convergence rate. Finally, we present an adaptive method inspired by the continuous-time analyses and establish its effectiveness through theoretical analyses and experiments.
△ Less
Submitted 2 November, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems
Authors:
Jaewook J. Suh,
Gyumin Roh,
Ernest K. Ryu
Abstract:
We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. Namely, instead of analyzing the dynamics of $X(t)$, we analyze the dynamics of $W(t)=t^α(X(t)-X_c)$ for some $α$ and $X_c$ and derive a conserved quantity, analogous to physical energy, in this dilated coordinate system. Through this methodology, we recover many know…
▽ More
We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. Namely, instead of analyzing the dynamics of $X(t)$, we analyze the dynamics of $W(t)=t^α(X(t)-X_c)$ for some $α$ and $X_c$ and derive a conserved quantity, analogous to physical energy, in this dilated coordinate system. Through this methodology, we recover many known continuous-time analyses in a streamlined manner and obtain novel continuous-time analyses for OGM-G, an acceleration mechanism for efficiently reducing gradient magnitude that is distinct from that of Nesterov. Finally, we show that a semi-second-order symplectic Euler discretization in the dilated coordinate system leads to an $\mathcal{O}(1/k^2)$ rate on the standard setup of smooth convex minimization, without any further assumptions such as infinite differentiability.
△ Less
Submitted 24 June, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.