Search | arXiv e-print repository

Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations

Abstract: We develop two classes of variance-reduced fast operator splitting methods to approximate solutions of both finite-sum and stochastic generalized equations. Our approach integrates recent advances in accelerated fixed-point methods, co-hypomonotonicity, and variance reduction. First, we introduce a class of variance-reduced estimators and establish their variance-reduction bounds. This class cover… ▽ More We develop two classes of variance-reduced fast operator splitting methods to approximate solutions of both finite-sum and stochastic generalized equations. Our approach integrates recent advances in accelerated fixed-point methods, co-hypomonotonicity, and variance reduction. First, we introduce a class of variance-reduced estimators and establish their variance-reduction bounds. This class covers both unbiased and biased instances and comprises common estimators as special cases, including SVRG, SAGA, SARAH, and Hybrid-SGD. Next, we design a novel accelerated variance-reduced forward-backward splitting (FBS) algorithm using these estimators to solve finite-sum and stochastic generalized equations. Our method achieves both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ convergence rates on the expected squared norm $\mathbb{E}[ \| G_λx^k\|^2]$ of the FBS residual $G_λ$, where $k$ is the iteration counter. Additionally, we establish, for the first time, almost sure convergence rates and almost sure convergence of iterates to a solution in stochastic accelerated methods. Unlike existing stochastic fixed-point algorithms, our methods accommodate co-hypomonotone operators, which potentially include nonmonotone problems arising from recent applications. We further specify our method to derive an appropriate variant for each stochastic estimator -- SVRG, SAGA, SARAH, and Hybrid-SGD -- demonstrating that they achieve the best-known complexity for each without relying on enhancement techniques. Alternatively, we propose an accelerated variance-reduced backward-forward splitting (BFS) method, which attains similar convergence rates and oracle complexity as our FBS method. Finally, we validate our results through several numerical experiments and compare their performance. △ Less

Submitted 6 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: 58 pages, 1 table, and 8 figures

Report number: STOR-UNC-03.2025 MSC Class: 90C25; 90C06; 90-08

arXiv:2501.04585 [pdf, other]

Accelerated Extragradient-Type Methods -- Part 2: Generalization and Sublinear Convergence Rates under Co-Hypomonotonicity

Authors: Quoc Tran-Dinh, Nghia Nguyen-Trung

Abstract: Following the first part of our project, this paper comprehensively studies two types of extragradient-based methods: anchored extragradient and Nesterov's accelerated extragradient for solving [non]linear inclusions (and, in particular, equations), primarily under the Lipschitz continuity and the co-hypomonotonicity assumptions. We unify and generalize a class of anchored extragradient methods fo… ▽ More Following the first part of our project, this paper comprehensively studies two types of extragradient-based methods: anchored extragradient and Nesterov's accelerated extragradient for solving [non]linear inclusions (and, in particular, equations), primarily under the Lipschitz continuity and the co-hypomonotonicity assumptions. We unify and generalize a class of anchored extragradient methods for monotone inclusions to a wider range of schemes encompassing existing algorithms as special cases. We establish $\mathcal{O}(1/k)$ last-iterate convergence rates on the residual norm of the underlying mapping for this general framework and then specialize it to obtain convergence guarantees for specific instances, where $k$ denotes the iteration counter. We extend our approach to a class of anchored Tseng's forward-backward-forward splitting methods to obtain a broader class of algorithms for solving co-hypomonotone inclusions. Again, we analyze $\mathcal{O}(1/k)$ last-iterate convergence rates for this general scheme and specialize it to obtain convergence results for existing and new variants. We generalize and unify Nesterov's accelerated extra-gradient method to a new class of algorithms that covers existing schemes as special instances while generating new variants. For these schemes, we can prove $\mathcal{O}(1/k)$ last-iterate convergence rates for the residual norm under co-hypomonotonicity, covering a class of nonmonotone problems. We propose another novel class of Nesterov's accelerated extragradient methods to solve inclusions. Interestingly, these algorithms achieve both $\mathcal{O}(1/k)$ and $o(1/k)$ last-iterate convergence rates, and also the convergence of iterate sequences under co-hypomonotonicity and Lipschitz continuity. Finally, we provide a set of numerical experiments encompassing different scenarios to validate our algorithms and theoretical guarantees. △ Less

Submitted 9 March, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

Comments: 75 pages, 7 figures, and 1 table

Report number: UNC-STOR-01.08.2025 MSC Class: 90C25; 90C06; 90-08

arXiv:2410.22297 [pdf, other]

Shuffling Gradient-Based Methods for Nonconvex-Concave Minimax Optimization

Authors: Quoc Tran-Dinh, Trang H. Tran, Lam M. Nguyen

Abstract: This paper aims at developing novel shuffling gradient-based methods for tackling two classes of minimax problems: nonconvex-linear and nonconvex-strongly concave settings. The first algorithm addresses the nonconvex-linear minimax model and achieves the state-of-the-art oracle complexity typically observed in nonconvex optimization. It also employs a new shuffling estimator for the "hyper-gradien… ▽ More This paper aims at developing novel shuffling gradient-based methods for tackling two classes of minimax problems: nonconvex-linear and nonconvex-strongly concave settings. The first algorithm addresses the nonconvex-linear minimax model and achieves the state-of-the-art oracle complexity typically observed in nonconvex optimization. It also employs a new shuffling estimator for the "hyper-gradient", departing from standard shuffling techniques in optimization. The second method consists of two variants: semi-shuffling and full-shuffling schemes. These variants tackle the nonconvex-strongly concave minimax setting. We establish their oracle complexity bounds under standard assumptions, which, to our best knowledge, are the best-known for this specific setting. Numerical examples demonstrate the performance of our algorithms and compare them with two other methods. Our results show that the new methods achieve comparable performance with SGD, supporting the potential of incorporating shuffling strategies into minimax algorithms. △ Less

Submitted 29 October, 2024; originally announced October 2024.

Comments: 45 pages, 5 figures (38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Report number: STOR-UNC-Oct2024

Journal ref: 38th Conference on Neural Information Processing Systems (NeurIPS 2024

arXiv:2409.16859 [pdf, other]

Revisiting Extragradient-Type Methods -- Part 1: Generalizations and Sublinear Convergence Rates

Authors: Quoc Tran-Dinh, Nghia Nguyen-Trung

Abstract: This paper presents a comprehensive analysis of the well-known extragradient (EG) method for solving both equations and inclusions. First, we unify and generalize EG for [non]linear equations to a wider class of algorithms, encompassing various existing schemes and potentially new variants. Next, we analyze both sublinear ``best-iterate'' and ``last-iterate'' convergence rates for the entire class… ▽ More This paper presents a comprehensive analysis of the well-known extragradient (EG) method for solving both equations and inclusions. First, we unify and generalize EG for [non]linear equations to a wider class of algorithms, encompassing various existing schemes and potentially new variants. Next, we analyze both sublinear ``best-iterate'' and ``last-iterate'' convergence rates for the entire class of algorithms, and derive new convergence results for two well-known instances. Second, we extend our EG framework above to ``monotone'' inclusions, introducing a new class of algorithms and its corresponding convergence results. Third, we also unify and generalize Tseng's forward-backward-forward splitting (FBFS) method to a broader class of algorithms to solve [non]linear inclusions when a weak-Minty solution exists, and establish its ``best-iterate'' convergence rate. Fourth, to complete our picture, we also investigate sublinear rates of two other common variants of EG using our EG analysis framework developed here: the reflected forward-backward splitting and the golden ratio methods. Finally, we conduct an extensive numerical experiment to validate our theoretical findings. Our results demonstrate that several new variants of our proposed algorithms outperform existing schemes in the majority of examples. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: 59 pages, 1 table, 9 figures. arXiv admin note: text overlap with arXiv:2303.17192

Report number: STOR-UNC-09.2024 MSC Class: 90C25; 90C06; 90-08

arXiv:2406.02413 [pdf, ps, other]

Variance-Reduced Fast Krasnoselkii-Mann Methods for Finite-Sum Root-Finding Problems

Authors: Quoc Tran-Dinh

Abstract: We propose a new class of fast Krasnoselkii--Mann methods with variance reduction to solve a finite-sum co-coercive equation $Gx = 0$. Our algorithm is single-loop and leverages a new family of unbiased variance-reduced estimators specifically designed for a wider class of root-finding algorithms. Our method achieves both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ last-iterate convergence rates in terms… ▽ More We propose a new class of fast Krasnoselkii--Mann methods with variance reduction to solve a finite-sum co-coercive equation $Gx = 0$. Our algorithm is single-loop and leverages a new family of unbiased variance-reduced estimators specifically designed for a wider class of root-finding algorithms. Our method achieves both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ last-iterate convergence rates in terms of $\mathbb{E}[\| Gx^k\|^2]$, where $k$ is the iteration counter and $\mathbb{E}[\cdot]$ is the total expectation. We also establish almost sure $o(1/k^2)$ convergence rates and the almost sure convergence of iterates $\{x^k\}$ to a solution of $Gx=0$. We instantiate our framework for two prominent estimators: SVRG and SAGA. By an appropriate choice of parameters, both variants attain an oracle complexity of $\mathcal{O}(n + n^{2/3}ε^{-1})$ to reach an $ε$-solution, where $n$ represents the number of summands in the finite-sum operator $G$. Furthermore, under $σ$-strong quasi-monotonicity, our method achieves a linear convergence rate and an oracle complexity of $\mathcal{O}(n+ \max\{n, n^{2/3}κ\} \log(\frac{1}ε))$, where $κ:= L/σ$. We extend our approach to solve a class of finite-sum inclusions (possibly nonmonotone), demonstrating that our schemes retain the same theoretical guarantees as in the equation setting. Finally, numerical experiments validate our algorithms and demonstrate their promising performance compared to state-of-the-art methods. △ Less

Submitted 6 June, 2025; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 31 pages, 2 figures

Report number: UNC-STOR-06.2024.P2 (Version 2) MSC Class: 90C25; 90C06; 90-08

arXiv:2303.17192 [pdf, ps, other]

Sublinear Convergence Rates of Extragradient-Type Methods: A Survey on Classical and Recent Developments

Authors: Quoc Tran-Dinh

Abstract: The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a well-known method to approximate solutions of saddle-point problems and their extensions such as variational inequalities and monotone inclusions. Over the years, numerous variants of EG have been proposed and studied in the literature. Recently, these methods have gained popularity due to new applications in machine learning an… ▽ More The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a well-known method to approximate solutions of saddle-point problems and their extensions such as variational inequalities and monotone inclusions. Over the years, numerous variants of EG have been proposed and studied in the literature. Recently, these methods have gained popularity due to new applications in machine learning and robust optimization. In this work, we survey the latest developments in the EG method and its variants for approximating solutions of nonlinear equations and inclusions, with a focus on the monotonicity and co-hypomonotonicity settings. We provide a unified convergence analysis for different classes of algorithms, with an emphasis on sublinear best-iterate and last-iterate convergence rates. We also discuss recent accelerated variants of EG based on both Halpern fixed-point iteration and Nesterov's accelerated techniques. Our approach uses simple arguments and basic mathematical tools to make the proofs as elementary as possible, while maintaining generality to cover a broad range of problems. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 47 pages, a survey paper

Report number: UNC-STOR-03.30.2023

arXiv:2302.04099 [pdf, ps, other]

Extragradient-Type Methods with $\mathcal{O} (1/k)$ Last-Iterate Convergence Rates for Co-Hypomonotone Inclusions

Authors: Quoc Tran-Dinh

Abstract: We develop two "Nesterov's accelerated" variants of the well-known extragradient method to approximate a solution of a co-hypomonotone inclusion constituted by the sum of two operators, where one is Lipschitz continuous and the other is possibly multivalued. The first scheme can be viewed as an accelerated variant of Tseng's forward-backward-forward splitting (FBFS) method, while the second one is… ▽ More We develop two "Nesterov's accelerated" variants of the well-known extragradient method to approximate a solution of a co-hypomonotone inclusion constituted by the sum of two operators, where one is Lipschitz continuous and the other is possibly multivalued. The first scheme can be viewed as an accelerated variant of Tseng's forward-backward-forward splitting (FBFS) method, while the second one is a Nesterov's accelerated variant of the "past" FBFS scheme, which requires only one evaluation of the Lipschitz operator and one resolvent of the multivalued mapping. Under appropriate conditions on the parameters, we theoretically prove that both algorithms achieve $\mathcal{O}(1/k)$ last-iterate convergence rates on the residual norm, where $k$ is the iteration counter. Our results can be viewed as alternatives of a recent class of Halpern-type methods for root-finding problems. For comparison, we also provide a new convergence analysis of the two recent extra-anchored gradient-type methods for solving co-hypomonotone inclusions. △ Less

Submitted 14 October, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: 25 pages

Report number: UNC-STOR-02.08.2023.P3 MSC Class: 90C25; 90-08

Journal ref: Preprint (2023)

arXiv:2301.03113 [pdf, ps, other]

Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems

Authors: Quoc Tran-Dinh, Yang Luo

Abstract: In this paper, we develop two new randomized block-coordinate optimistic gradient algorithms to approximate a solution of nonlinear equations in large-scale settings, which are called root-finding problems. Our first algorithm is non-accelerated with constant stepsizes, and achieves $\mathcal{O}(1/k)$ best-iterate convergence rate on $\mathbb{E}[ \Vert Gx^k\Vert^2]$ when the underlying operator… ▽ More In this paper, we develop two new randomized block-coordinate optimistic gradient algorithms to approximate a solution of nonlinear equations in large-scale settings, which are called root-finding problems. Our first algorithm is non-accelerated with constant stepsizes, and achieves $\mathcal{O}(1/k)$ best-iterate convergence rate on $\mathbb{E}[ \Vert Gx^k\Vert^2]$ when the underlying operator $G$ is Lipschitz continuous and satisfies a weak Minty solution condition, where $\mathbb{E}[\cdot]$ is the expectation and $k$ is the iteration counter. Our second method is a new accelerated randomized block-coordinate optimistic gradient algorithm. We establish both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ last-iterate convergence rates on both $\mathbb{E}[ \Vert Gx^k\Vert^2]$ and $\mathbb{E}[ \Vert x^{k+1} - x^{k}\Vert^2]$ for this algorithm under the co-coerciveness of $G$. In addition, we prove that the iterate sequence $\{x^k\}$ converges to a solution almost surely, and $k\Vert Gx^k\Vert$ attains a $o(1/k)$ almost sure convergence rate. Then, we apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning, statistical learning, and network optimization, especially in federated learning. We obtain two new federated learning-type algorithms and their convergence rate guarantees for solving this problem class. △ Less

Submitted 11 June, 2025; v1 submitted 8 January, 2023; originally announced January 2023.

Comments: 38 pages, and 6 figures

Report number: UNC-STOR-2022.07.RBCM

Journal ref: Mathematics of Operations Research, 2025

arXiv:2212.09413 [pdf, ps, other]

Gradient Descent-Type Methods: Background and Simple Unified Convergence Analysis

Authors: Quoc Tran-Dinh, Marten van Dijk

Abstract: In this book chapter, we briefly describe the main components that constitute the gradient descent method and its accelerated and stochastic variants. We aim at explaining these components from a mathematical point of view, including theoretical and practical aspects, but at an elementary level. We will focus on basic variants of the gradient descent method and then extend our view to recent varia… ▽ More In this book chapter, we briefly describe the main components that constitute the gradient descent method and its accelerated and stochastic variants. We aim at explaining these components from a mathematical point of view, including theoretical and practical aspects, but at an elementary level. We will focus on basic variants of the gradient descent method and then extend our view to recent variants, especially variance-reduced stochastic gradient schemes (SGD). Our approach relies on revealing the structures presented inside the problem and the assumptions imposed on the objective function. Our convergence analysis unifies several known results and relies on a general, but elementary recursive expression. We have illustrated this analysis on several common schemes. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 24 pages, a book chapter

Report number: UNC-STOR-Dec.2022

arXiv:2212.00703 [pdf, other]

Data Integration Via Analysis of Subspaces (DIVAS)

Authors: Jack B. Prothero, Meilei Jiang, Jan Hannig, Quoc Tran-Dinh, Andrew Ackerman, J. S. Marron

Abstract: Modern data collection in many data paradigms, including bioinformatics, often incorporates multiple traits derived from different data types (i.e. platforms). We call this data multi-block, multi-view, or multi-omics data. The emergent field of data integration develops and applies new methods for studying multi-block data and identifying how different data types relate and differ. One major fron… ▽ More Modern data collection in many data paradigms, including bioinformatics, often incorporates multiple traits derived from different data types (i.e. platforms). We call this data multi-block, multi-view, or multi-omics data. The emergent field of data integration develops and applies new methods for studying multi-block data and identifying how different data types relate and differ. One major frontier in contemporary data integration research is methodology that can identify partially-shared structure between sub-collections of data types. This work presents a new approach: Data Integration Via Analysis of Subspaces (DIVAS). DIVAS combines new insights in angular subspace perturbation theory with recent developments in matrix signal processing and convex-concave optimization into one algorithm for exploring partially-shared structure. Based on principal angles between subspaces, DIVAS provides built-in inference on the results of the analysis, and is effective even in high-dimension-low-sample-size (HDLSS) situations. △ Less

Submitted 17 January, 2024; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2110.08150 [pdf, ps, other]

Halpern-Type Accelerated and Splitting Algorithms For Monotone Inclusions

Authors: Quoc Tran-Dinh, Yang Luo

Abstract: In this paper, we develop a new type of accelerated algorithms to solve some classes of maximally monotone equations as well as monotone inclusions. Instead of using Nesterov's accelerating approach, our methods rely on a so-called Halpern-type fixed-point iteration in [32], and recently exploited by a number of researchers, including [24, 70]. Firstly, we derive a new variant of the anchored extr… ▽ More In this paper, we develop a new type of accelerated algorithms to solve some classes of maximally monotone equations as well as monotone inclusions. Instead of using Nesterov's accelerating approach, our methods rely on a so-called Halpern-type fixed-point iteration in [32], and recently exploited by a number of researchers, including [24, 70]. Firstly, we derive a new variant of the anchored extra-gradient scheme in [70] based on Popov's past extra-gradient method to solve a maximally monotone equation $G(x) = 0$. We show that our method achieves the same $\mathcal{O}(1/k)$ convergence rate (up to a constant factor) as in the anchored extra-gradient algorithm on the operator norm $\Vert G(x_k)\Vert$, , but requires only one evaluation of $G$ at each iteration, where $k$ is the iteration counter. Next, we develop two splitting algorithms to approximate a zero point of the sum of two maximally monotone operators. The first algorithm originates from the anchored extra-gradient method combining with a splitting technique, while the second one is its Popov's variant which can reduce the per-iteration complexity. Both algorithms appear to be new and can be viewed as accelerated variants of the Douglas-Rachford (DR) splitting method. They both achieve $\mathcal{O}(1/k)$ rates on the norm $\Vert G_γ(x_k)\Vert$ of the forward-backward residual operator $G_γ(\cdot)$ associated with the problem. We also propose a new accelerated Douglas-Rachford splitting scheme for solving this problem which achieves $\mathcal{O}(1/k)$ convergence rate on $\Vert G_γ(x_k)\Vert$ under only maximally monotone assumptions. Finally, we specify our first algorithm to solve convex-concave minimax problems and apply our accelerated DR scheme to derive a new variant of the alternating direction method of multipliers (ADMM). △ Less

Submitted 6 December, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: 33 pages

Report number: UNC-STOR-10.15.2021-v1

arXiv:2103.03452 [pdf, other]

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We develop two new algorithms, called, FedDR and asyncFedDR, for solving a fundamental nonconvex composite optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. They can also handle convex regularizers. Unlike recent methods in the literat… ▽ More We develop two new algorithms, called, FedDR and asyncFedDR, for solving a fundamental nonconvex composite optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. They can also handle convex regularizers. Unlike recent methods in the literature, e.g., FedSplit and FedPD, our algorithms update only a subset of users at each communication round, and possibly in an asynchronous manner, making them more practical. These new algorithms can handle statistical and system heterogeneity, which are the two main challenges in federated learning, while achieving the best known communication complexity. In fact, our new algorithms match the communication complexity lower bound up to a constant factor under standard assumptions. Our numerical experiments illustrate the advantages of our methods over existing algorithms on synthetic and real datasets. △ Less

Submitted 28 October, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 39 pages, and 12 figures

Report number: UNC-STOR-June 2021

Journal ref: NeurIPs 2021

arXiv:2011.11884 [pdf, other]

SMG: A Shuffling Gradient-Based Method with Momentum

Authors: Trang H. Tran, Lam M. Nguyen, Quoc Tran-Dinh

Abstract: We combine two advanced ideas widely used in optimization for machine learning: shuffling strategy and momentum technique to develop a novel shuffling gradient-based method with momentum, coined Shuffling Momentum Gradient (SMG), for non-convex finite-sum optimization problems. While our method is inspired by momentum techniques, its update is fundamentally different from existing momentum-based m… ▽ More We combine two advanced ideas widely used in optimization for machine learning: shuffling strategy and momentum technique to develop a novel shuffling gradient-based method with momentum, coined Shuffling Momentum Gradient (SMG), for non-convex finite-sum optimization problems. While our method is inspired by momentum techniques, its update is fundamentally different from existing momentum-based methods. We establish state-of-the-art convergence rates of SMG for any shuffling strategy using either constant or diminishing learning rate under standard assumptions (i.e.$L$-smoothness and bounded variance). When the shuffling strategy is fixed, we develop another new algorithm that is similar to existing momentum methods, and prove the same convergence rates for this algorithm under the $L$-smoothness and bounded gradient assumptions. We demonstrate our algorithms via numerical simulations on standard datasets and compare them with existing shuffling methods. Our tests have shown encouraging performance of the new algorithms. △ Less

Submitted 9 June, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: The 38th International Conference on Machine Learning (ICML 2021)

arXiv:2010.14763 [pdf, other]

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc Tran-Dinh, Phuong Ha Nguyen

Abstract: Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computation… ▽ More Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets. △ Less

Submitted 26 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.09208 AISTATS 2021

arXiv:2008.09055 [pdf, ps, other]

An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite Nonconvex Optimization

Authors: Deyi Liu, Lam M. Nguyen, Quoc Tran-Dinh

Abstract: In this note we propose a new variant of the hybrid variance-reduced proximal gradient method in [7] to solve a common stochastic composite nonconvex optimization problem under standard assumptions. We simply replace the independent unbiased estimator in our hybrid- SARAH estimator introduced in [7] by the stochastic gradient evaluated at the same sample, leading to the identical momentum-SARAH es… ▽ More In this note we propose a new variant of the hybrid variance-reduced proximal gradient method in [7] to solve a common stochastic composite nonconvex optimization problem under standard assumptions. We simply replace the independent unbiased estimator in our hybrid- SARAH estimator introduced in [7] by the stochastic gradient evaluated at the same sample, leading to the identical momentum-SARAH estimator introduced in [2]. This allows us to save one stochastic gradient per iteration compared to [7], and only requires two samples per iteration. Our algorithm is very simple and achieves optimal stochastic oracle complexity bound in terms of stochastic gradient evaluations (up to a constant factor). Our analysis is essentially inspired by [7], but we do not use two different step-sizes. △ Less

Submitted 20 August, 2020; originally announced August 2020.

Comments: 6 pages

Report number: STOR-UNC-08-20-P4

arXiv:2007.09208 [pdf, other]

Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc Tran-Dinh, Phuong Ha Nguyen

Abstract: The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between cl… ▽ More The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.15266 [pdf, other]

Hybrid Variance-Reduced SGD Algorithms For Nonconvex-Concave Minimax Problems

Authors: Quoc Tran-Dinh, Deyi Liu, Lam M. Nguyen

Abstract: We develop a novel and single-loop variance-reduced algorithm to solve a class of stochastic nonconvex-convex minimax problems involving a nonconvex-linear objective function, which has various applications in different fields such as machine learning and robust optimization. This problem class has several computational challenges due to its nonsmoothness, nonconvexity, nonlinearity, and non-separ… ▽ More We develop a novel and single-loop variance-reduced algorithm to solve a class of stochastic nonconvex-convex minimax problems involving a nonconvex-linear objective function, which has various applications in different fields such as machine learning and robust optimization. This problem class has several computational challenges due to its nonsmoothness, nonconvexity, nonlinearity, and non-separability of the objective functions. Our approach relies on a new combination of recent ideas, including smoothing and hybrid biased variance-reduced techniques. Our algorithm and its variants can achieve $\mathcal{O}(T^{-2/3})$-convergence rate and the best known oracle complexity under standard assumptions, where $T$ is the iteration counter. They have several computational advantages compared to existing methods such as simple to implement and less parameter tuning requirements. They can also work with both single sample or mini-batch on derivative estimators, and with constant or diminishing step-sizes. We demonstrate the benefits of our algorithms over existing methods through two numerical examples, including a nonsmooth and nonconvex-non-strongly concave minimax model. △ Less

Submitted 26 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 30 pages and 6 figures

Report number: UNC-STOR-June 2020

arXiv:2003.01322 [pdf, other]

A New Randomized Primal-Dual Algorithm for Convex Optimization with Optimal Last Iterate Rates

Authors: Quoc Tran-Dinh, Deyi Liu

Abstract: We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems, which covers different existing variants and model settings from the literature. We prove that our algorithm achieves optimal $\mathcal{O}(n/k)$ and $\mathcal{O}(n^2/k^2)$ convergence rates (up to a constant factor) in two cases: general convexity and… ▽ More We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems, which covers different existing variants and model settings from the literature. We prove that our algorithm achieves optimal $\mathcal{O}(n/k)$ and $\mathcal{O}(n^2/k^2)$ convergence rates (up to a constant factor) in two cases: general convexity and strong convexity, respectively, where $k$ is the iteration counter and n is the number of block-coordinates. Our convergence rates are obtained through three criteria: primal objective residual and primal feasibility violation, dual objective residual, and primal-dual expected gap. Moreover, our rates for the primal problem are on the last iterate sequence. Our dual convergence guarantee requires additionally a Lipschitz continuity assumption. We specify our algorithm to handle two important special cases, where our rates are still applied. Finally, we verify our algorithm on two well-studied numerical examples and compare it with two existing methods. Our results show that the proposed method has encouraging performance on different experiments. △ Less

Submitted 28 October, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: 29, 5 figures

Report number: UNC-STOR 03.2020.W4 (Third version, October 2021)

arXiv:2002.08246 [pdf, other]

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

Authors: Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk

Abstract: In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two diff… ▽ More In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants. △ Less

Submitted 19 September, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Journal of Machine Learning Research, 2021

arXiv:2002.07290 [pdf, other]

Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Lam M. Nguyen

Abstract: We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish… ▽ More We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity to achieve a stationary point in expectation and estimate the total number of stochastic oracle calls for both function value and its Jacobian, where $\varepsilon$ is a desired accuracy. In the finite sum case, we also estimate $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity and the total oracle calls with high probability. To our best knowledge, this is the first time such global stochastic oracle complexity is established for stochastic Gauss-Newton methods. Finally, we illustrate our theoretical results via two numerical examples on both synthetic and real datasets. △ Less

Submitted 2 July, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: 32 pages and 8 figures

Report number: UNC-STOR-Feb 2019-02

Journal ref: ICML 2020

arXiv:2002.07003 [pdf, other]

A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization

Authors: Deyi Liu, Volkan Cevher, Quoc Tran-Dinh

Abstract: We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case. Specifically, our Newton Frank-Wolfe method uses $\mathcal{O}(ε^{-ν})$ LMO's, where $ε$ is the desired accurac… ▽ More We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case. Specifically, our Newton Frank-Wolfe method uses $\mathcal{O}(ε^{-ν})$ LMO's, where $ε$ is the desired accuracy and $ν:= 1 + o(1)$. In addition, we demonstrate how our algorithm can exploit the improved variants of the LMO-based schemes, including away-steps, to attain linear convergence rates. We also provide numerical evidence with portfolio design with the competitive ratio, D-optimal experimental design, and logistic regression with the elastic net where Newton Frank-Wolfe outperforms the state-of-the-art. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:1907.03793 [pdf, other]

A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We first introduce our hybrid estimator and then investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply… ▽ More We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We first introduce our hybrid estimator and then investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply our theory to develop several variants of stochastic gradient methods to solve both expectation and finite-sum composite optimization problems. Our first algorithm can be viewed as a variant of proximal stochastic gradient methods with a single-loop, but can achieve $\mathcal{O}(σ^3\varepsilon^{-1} + σ\varepsilon^{-3})$-oracle complexity bound, matching the best-known ones from state-of-the-art double-loop algorithms in the literature, where $σ> 0$ is the variance and $\varepsilon$ is a desired accuracy. Then, we consider two different variants of our method: adaptive step-size and restarting schemes that have similar theoretical guarantees as in our first algorithm. We also study two mini-batch variants of the proposed methods. In all cases, we achieve the best-known complexity bounds under standard assumptions. We test our methods on several numerical examples with real datasets and compare them with state-of-the-arts. Our numerical experiments show that the new methods are comparable and, in many cases, outperform their competitors. △ Less

Submitted 2 May, 2020; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: 49 pages, 2 tables, 9 figures

Report number: UNC-STOR-2019.07.V1-03

arXiv:1905.05920 [pdf, other]

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to ha… ▽ More We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to handle a broader class of estimators in both convex and nonconvex settings. We propose a new single-loop stochastic gradient descent algorithm that can achieve $O(\max\{σ^3\varepsilon^{-1},σ\varepsilon^{-3}\})$-complexity bound to obtain an $\varepsilon$-stationary point under smoothness and $σ^2$-bounded variance assumptions. This complexity is better than $O(σ^2\varepsilon^{-4})$ often obtained in state-of-the-art SGDs when $σ< O(\varepsilon^{-3})$. We also consider different extensions of our method, including constant and adaptive step-size with single-loop, double-loop, and mini-batch variants. We compare our algorithms with existing methods on several datasets using two nonconvex models. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: 41 pages and 18 figures

Report number: UNC-STOR-May-2019-03

arXiv:1902.05679 [pdf, other]

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

Authors: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh

Abstract: We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The… ▽ More We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-of-the-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets. △ Less

Submitted 28 March, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: 45 pages, 8 figures, and 2 table

Report number: STOR-UNC-Feb14.2019

arXiv:1801.03765 [pdf, other]

Non-stationary Douglas-Rachford and alternating direction method of multipliers: adaptive stepsizes and convergence

Authors: Dirk A. Lorenz, Quoc Tran-Dinh

Abstract: We revisit the classical Douglas-Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the stepsizes, we aim at developing an adaptive stepsize rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a stepsize strategy that eliminates the need for… ▽ More We revisit the classical Douglas-Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the stepsizes, we aim at developing an adaptive stepsize rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a stepsize strategy that eliminates the need for stepsize tuning. We analyze a general non-stationary DR scheme and prove its convergence for a convergent sequence of stepsizes with summable increments. This, in turn, proves the convergence of the method with the new adaptive stepsize rule. We also derive the related non-stationary alternating direction method of multipliers (ADMM) from such a non-stationary DR method. We illustrate the efficiency of the proposed methods on several numerical examples. △ Less

Submitted 27 September, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

MSC Class: 90C25; 65K05; 65J15; 47H05

arXiv:1711.03439 [pdf, other]

Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization

Authors: Ahmet Alacaoglu, Quoc Tran-Dinh, Olivier Fercoq, Volkan Cevher

Abstract: We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent met… ▽ More We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent methods, that are the best-known under a variety of common structure assumptions on the template. We provide numerical evidence to support the theoretical results with a comparison to state-of-the-art algorithms. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: NIPS 2017

arXiv:1703.04599 [pdf, other]

Generalized Self-Concordant Functions: A Recipe for Newton-Type Methods

Authors: Tianxiao Sun, Quoc Tran-Dinh

Abstract: We study the smooth structure of convex functions by generalizing a powerful concept so-called self-concordance introduced by Nesterov and Nemirovskii in the early 1990s to a broader class of convex functions, which we call generalized self-concordant functions. This notion allows us to develop a unified framework for designing Newton-type methods to solve convex optimiza- tion problems. The propo… ▽ More We study the smooth structure of convex functions by generalizing a powerful concept so-called self-concordance introduced by Nesterov and Nemirovskii in the early 1990s to a broader class of convex functions, which we call generalized self-concordant functions. This notion allows us to develop a unified framework for designing Newton-type methods to solve convex optimiza- tion problems. The proposed theory provides a mathematical tool to analyze both local and global convergence of Newton-type methods without imposing unverifiable assumptions as long as the un- derlying functionals fall into our generalized self-concordant function class. First, we introduce the class of generalized self-concordant functions, which covers standard self-concordant functions as a special case. Next, we establish several properties and key estimates of this function class, which can be used to design numerical methods. Then, we apply this theory to develop several Newton-type methods for solving a class of smooth convex optimization problems involving the generalized self- concordant functions. We provide an explicit step-size for the damped-step Newton-type scheme which can guarantee a global convergence without performing any globalization strategy. We also prove a local quadratic convergence of this method and its full-step variant without requiring the Lipschitz continuity of the objective Hessian. Then, we extend our result to develop proximal Newton-type methods for a class of composite convex minimization problems involving generalized self-concordant functions. We also achieve both global and local convergence without additional assumption. Finally, we verify our theoretical results via several numerical examples, and compare them with existing methods. △ Less

Submitted 8 May, 2018; v1 submitted 14 March, 2017; originally announced March 2017.

Comments: 47 pages, 2 figures

Report number: UNC-STOR-March-2017

Journal ref: Mathematical Programming (2018)

arXiv:1606.03358 [pdf, other]

Extended Gauss-Newton and ADMM-Gauss-Newton Algorithms for Low-Rank Matrix Optimization

Authors: Quoc Tran-Dinh

Abstract: In this paper, we develop a variant of the well-known Gauss-Newton (GN) method to solve a class of nonconvex optimization problems involving low-rank matrix variables. As opposed to the standard GN method, our algorithm allows one to handle general smooth convex objective function. We show, under mild conditions, that the proposed algorithm globally and locally converges to a stationary point of t… ▽ More In this paper, we develop a variant of the well-known Gauss-Newton (GN) method to solve a class of nonconvex optimization problems involving low-rank matrix variables. As opposed to the standard GN method, our algorithm allows one to handle general smooth convex objective function. We show, under mild conditions, that the proposed algorithm globally and locally converges to a stationary point of the original problem. We also show empirically that the GN algorithm achieves higher accurate solutions than the alternating minimization algorithm (AMA). Then, we specify our GN scheme to handle the symmetric case and prove its convergence, where AMA is not applicable. Next, we incorporate our GN scheme into the alternating direction method of multipliers (ADMM) to develop an ADMM-GN algorithm. We prove that, under mild conditions and a proper choice of the penalty parameter, our ADMM-GN globally converges to a stationary point of the original problem. Finally, we provide several numerical experiments to illustrate the proposed algorithms. Our results show that the new algorithms have encouraging performance compared to existing methods. △ Less

Submitted 26 October, 2020; v1 submitted 10 June, 2016; originally announced June 2016.

Comments: 35 pages, 5 figures and 5 tables. The code can be found at http://www.trandinhquoc.com, UNC-STAT&OR - Tech. Report 2016.a

arXiv:1603.06313 [pdf, other]

Convex block-sparse linear regression with expanders -- provably

Authors: Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, Volkan Cevher

Abstract: Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear ske… ▽ More Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution lives in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance. △ Less

Submitted 2 April, 2016; v1 submitted 20 March, 2016; originally announced March 2016.

Comments: 12 pages, 6 figures, to appear at AISTATS

arXiv:1603.01681 [pdf, other]

A single-phase, proximal path-following framework

Authors: Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

Abstract: We propose a new proximal, path-following framework for a class of constrained convex problems. We consider settings where the nonlinear---and possibly non-smooth---objective part is endowed with a proximity operator, and the constraint set is equipped with a self-concordant barrier. Our approach relies on the following two main ideas. First, we re-parameterize the optimality condition as an auxil… ▽ More We propose a new proximal, path-following framework for a class of constrained convex problems. We consider settings where the nonlinear---and possibly non-smooth---objective part is endowed with a proximity operator, and the constraint set is equipped with a self-concordant barrier. Our approach relies on the following two main ideas. First, we re-parameterize the optimality condition as an auxiliary problem, such that a good initial point is available; by doing so, a family of alternative paths towards the optimum is generated. Second, we combine the proximal operator with path-following ideas to design a single-phase, proximal, path-following algorithm. Our method has several advantages. First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization. Second, it consists of only a \emph{single phase}: While the overall convergence rate of classical path-following schemes for self-concordant objectives does not suffer from the initialization phase, proximal path-following schemes undergo slow convergence, in order to obtain a good starting point \cite{TranDinh2013e}. In this work, we show how to overcome this limitation in the proximal setting and prove that our scheme has the same $\mathcal{O}(\sqrtν\log(1/\varepsilon))$ worst-case iteration-complexity with standard approaches \cite{Nesterov2004,Nesterov1994} without requiring an initial phase, where $ν$ is the barrier parameter and $\varepsilon$ is a desired accuracy. Finally, our framework allows errors in the calculation of proximal-Newton directions, without sacrificing the worst-case iteration complexity. We demonstrate the merits of our algorithm via three numerical examples, where proximal operators play a key role. △ Less

Submitted 25 December, 2016; v1 submitted 5 March, 2016; originally announced March 2016.

Comments: 26 pages, 2 figures, 4 tables (This is the first revision. The original one was uploaded on arxiv on March 5, 2016

Report number: 90C06, 90C25, 90-08 MSC Class: 90C06; 90C25; 90-08

arXiv:1602.00724 [pdf, other]

Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval

Authors: Gergely Odor, Yen-Huan Li, Alp Yurtsever, Ya-Ping Hsieh, Quoc Tran-Dinh, Marwa El Halabi, Volkan Cevher

Abstract: We study a phase retrieval problem in the Poisson noise model. Motivated by the PhaseLift approach, we approximate the maximum-likelihood estimator by solving a convex program with a nuclear norm constraint. While the Frank-Wolfe algorithm, together with the Lanczos method, can efficiently deal with nuclear norm constraints, our objective function does not have a Lipschitz continuous gradient, and… ▽ More We study a phase retrieval problem in the Poisson noise model. Motivated by the PhaseLift approach, we approximate the maximum-likelihood estimator by solving a convex program with a nuclear norm constraint. While the Frank-Wolfe algorithm, together with the Lanczos method, can efficiently deal with nuclear norm constraints, our objective function does not have a Lipschitz continuous gradient, and hence existing convergence guarantees for the Frank-Wolfe algorithm do not apply. In this paper, we show that the Frank-Wolfe algorithm works for the Poisson phase retrieval problem, and has a global convergence rate of O(1/t), where t is the iteration counter. We provide rigorous theoretical guarantee and illustrating numerical results. △ Less

Submitted 1 February, 2016; originally announced February 2016.

arXiv:1509.00106 [pdf, other]

Adaptive Smoothing Algorithms for Nonsmooth Composite Convex Minimization

Authors: Quoc Tran-Dinh

Abstract: We propose an adaptive smoothing algorithm based on Nesterov's smoothing technique in \cite{Nesterov2005c} for solving "fully" nonsmooth composite convex optimization problems. Our method combines both Nesterov's accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the… ▽ More We propose an adaptive smoothing algorithm based on Nesterov's smoothing technique in \cite{Nesterov2005c} for solving "fully" nonsmooth composite convex optimization problems. Our method combines both Nesterov's accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the $\mathcal{O}\left(\frac{1}{\varepsilon}\right)$-worst-case iteration-complexity while preserves the same complexity-per-iteration as in Nesterov's method and allows one to automatically update the smoothness parameter at each iteration. Then, we customize our algorithm to solve four special cases that cover various applications. We also specify our algorithm to solve constrained convex optimization problems and show its convergence guarantee on a primal sequence of iterates. We demonstrate our algorithm through three numerical examples and compare it with other related algorithms. △ Less

Submitted 3 July, 2016; v1 submitted 31 August, 2015; originally announced September 2015.

Comments: This paper has 23 pages, 3 figures and 1 table

Report number: Tech. Report. STOR-2015-a

arXiv:1507.05367 [pdf, other]

Structured Sparsity: Discrete and Convex approaches

Authors: Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh, Volkan Cevher

Abstract: Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional spac… ▽ More Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated {\em structured} sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications. △ Less

Submitted 19 July, 2015; originally announced July 2015.

Comments: 30 pages, 18 figures

arXiv:1507.03734 [pdf, other]

Smooth Alternating Direction Methods for Nonsmooth Constrained Convex Optimization

Authors: Quoc Tran-Dinh, Volkan Cevher

Abstract: We propose two new alternating direction methods to solve "fully" nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap. Through theoretical analysis, we show how to update all the algorithmic parameters automatically with clear impact on the convergence performa… ▽ More We propose two new alternating direction methods to solve "fully" nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap. Through theoretical analysis, we show how to update all the algorithmic parameters automatically with clear impact on the convergence performance. We also provide a representative numerical example showing the advantages of our methods over the classical alternating direction methods using a well-known feasibility problem. △ Less

Submitted 15 January, 2018; v1 submitted 14 July, 2015; originally announced July 2015.

Comments: 35 pages, 1 figure

Report number: UNC-STOR 2016

arXiv:1502.01068 [pdf, other]

Composite convex minimization involving self-concordant-like cost functions

Authors: Quoc Tran-Dinh, Yen-Huan Li, Volkan Cevher

Abstract: The self-concordant-like property of a smooth convex function is a new analytical structure that generalizes the self-concordant notion. While a wide variety of important applications feature the self-concordant-like property, this concept has heretofore remained unexploited in convex optimization. To this end, we develop a variable metric framework of minimizing the sum of a "simple" convex funct… ▽ More The self-concordant-like property of a smooth convex function is a new analytical structure that generalizes the self-concordant notion. While a wide variety of important applications feature the self-concordant-like property, this concept has heretofore remained unexploited in convex optimization. To this end, we develop a variable metric framework of minimizing the sum of a "simple" convex function and a self-concordant-like function. We introduce a new analytic step-size selection procedure and prove that the basic gradient algorithm has improved convergence guarantees as compared to "fast" algorithms that rely on the Lipschitz gradient property. Our numerical tests with real-data sets shows that the practice indeed follows the theory. △ Less

Submitted 20 January, 2018; v1 submitted 3 February, 2015; originally announced February 2015.

Comments: 19 pages, 5 figures

Report number: LIONS-EPFL-2015A

arXiv:1406.5403 [pdf, other]

A Primal-Dual Algorithmic Framework for Constrained Convex Minimization

Authors: Quoc Tran-Dinh, Volkan Cevher

Abstract: We present a primal-dual algorithmic framework to obtain approximate solutions to a prototypical constrained convex optimization problem, and rigorously characterize how common structural assumptions affect the numerical efficiency. Our main analysis technique provides a fresh perspective on Nesterov's excessive gap technique in a structured fashion and unifies it with smoothing and primal-dual me… ▽ More We present a primal-dual algorithmic framework to obtain approximate solutions to a prototypical constrained convex optimization problem, and rigorously characterize how common structural assumptions affect the numerical efficiency. Our main analysis technique provides a fresh perspective on Nesterov's excessive gap technique in a structured fashion and unifies it with smoothing and primal-dual methods. For instance, through the choices of a dual smoothing strategy and a center point, our framework subsumes decomposition algorithms, augmented Lagrangian as well as the alternating direction method-of-multipliers methods as its special cases, and provides optimal convergence rates on the primal objective residual as well as the primal feasibility gap of the iterates for all. △ Less

Submitted 3 March, 2015; v1 submitted 20 June, 2014; originally announced June 2014.

Comments: This paper consists of 54 pages with 7 tables and 12 figures

Report number: Technical Report LIONS-EPFL 2014

arXiv:1405.3263 [pdf, other]

Scalable sparse covariance estimation via self-concordance

Authors: Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh, Volkan Cevher

Abstract: We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz… ▽ More We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity. △ Less

Submitted 13 May, 2014; originally announced May 2014.

Comments: 7 pages, 1 figure, Accepted at AAAI-14

arXiv:1308.2867 [pdf, other]

Composite Self-Concordant Minimization

Authors: Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

Abstract: We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size sel… ▽ More We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data. △ Less

Submitted 14 April, 2014; v1 submitted 13 August, 2013; originally announced August 2013.

Comments: 46 pages, 9 figures

Showing 1–38 of 38 results for author: Tran-Dinh, Q