-
Heavy Ball and Nesterov Accelerations with Hessian-driven Damping for Nonconvex Optimization
Authors:
N. Hadjisavvas,
F. Lara,
R. T. Marcavillaca,
P. T. Vuong
Abstract:
In this work, we investigate a second-order dynamical system with Hessian-driven damping tailored for a class of nonconvex functions called strongly quasiconvex. Buil\-ding upon this continuous-time model, we derive two discrete-time gra\-dient-based algorithms through time discretizations. The first is a Heavy Ball method with Hessian correction, incorporating cur\-va\-tu\-re-dependent terms that…
▽ More
In this work, we investigate a second-order dynamical system with Hessian-driven damping tailored for a class of nonconvex functions called strongly quasiconvex. Buil\-ding upon this continuous-time model, we derive two discrete-time gra\-dient-based algorithms through time discretizations. The first is a Heavy Ball method with Hessian correction, incorporating cur\-va\-tu\-re-dependent terms that arise from discretizing the Hessian damping component. The second is a Nesterov-type accelerated method with adaptive momentum, fea\-tu\-ring correction terms that account for local curvature. Both algorithms aim to enhance stability and convergence performance, particularly by mi\-ti\-ga\-ting oscillations commonly observed in cla\-ssi\-cal momentum me\-thods. Furthermore, in both cases we establish li\-near convergence to the optimal solution for the iterates and functions values. Our approach highlights the rich interplay between continuous-time dynamics and discrete optimization algorithms in the se\-tting of strongly quasiconvex objectives. Numerical experiments are presented to support obtained results.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Efficiently learning and sampling multimodal distributions with data-based initialization
Authors:
Frederic Koehler,
Holden Lee,
Thuy-Duong Vuong
Abstract:
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure. Although mixing can be arbitrarily slow, we show that if the Markov chain has a $k$th order spectral gap, initialization from a set of $\tilde O(k/\varepsilon^2)$ samples from the stationary distribution will, with high probability over the samples, efficie…
▽ More
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure. Although mixing can be arbitrarily slow, we show that if the Markov chain has a $k$th order spectral gap, initialization from a set of $\tilde O(k/\varepsilon^2)$ samples from the stationary distribution will, with high probability over the samples, efficiently generate a sample whose conditional law is $\varepsilon$-close in TV distance to the stationary measure. In particular, this applies to mixtures of $k$ distributions satisfying a Poincaré inequality, with faster convergence when they satisfy a log-Sobolev inequality. Our bounds are stable to perturbations to the Markov chain, and in particular work for Langevin diffusion over $\mathbb R^d$ with score estimation error, as well as Glauber dynamics combined with approximation error from pseudolikelihood estimation. This justifies the success of data-based initialization for score matching methods despite slow mixing for the data distribution, and improves and generalizes the results of Koehler and Vuong (2023) to have linear, rather than exponential, dependence on $k$ and apply to arbitrary semigroups. As a consequence of our results, we show for the first time that a natural class of low-complexity Ising measures can be efficiently learned from samples.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
Classification under strategic adversary manipulation using pessimistic bilevel optimisation
Authors:
David Benfield,
Stefano Coniglio,
Martin Kunc,
Phan Tu Vuong,
Alain Zemkoho
Abstract:
Adversarial machine learning concerns situations in which learners face attacks from active adversaries. Such scenarios arise in applications such as spam email filtering, malware detection and fake-image generation, where security methods must be actively updated to keep up with the ever improving generation of malicious data.We model these interactions between the learner and the adversary as a…
▽ More
Adversarial machine learning concerns situations in which learners face attacks from active adversaries. Such scenarios arise in applications such as spam email filtering, malware detection and fake-image generation, where security methods must be actively updated to keep up with the ever improving generation of malicious data.We model these interactions between the learner and the adversary as a game and formulate the problem as a pessimistic bilevel optimisation problem with the learner taking the role of the leader. The adversary, modelled as a stochastic data generator, takes the role of the follower, generating data in response to the classifier. While existing models rely on the assumption that the adversary will choose the least costly solution leading to a convex lower-level problem with a unique solution, we present a novel model and solution method which do not make such assumptions. We compare these to the existing approach and see significant improvements in performance suggesting that relaxing these assumptions leads to a more realistic model.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Characterizations, Dynamical Systems and Gradient Methods for Strongly Quasiconvex Functions
Authors:
Felipe Lara,
Raúl T. Marcavillaca,
Phan T. Vuong
Abstract:
We study differentiable strongly quasiconvex functions for providing new properties for algorithmic and monotonicity purposes. Furthemore, we provide insights into the decreasing behaviour of strongly quasiconvex functions, applying this for establishing exponential convergence for first- and second-order gradient systems without relying on the usual Lipschitz continuity assumption on the gradient…
▽ More
We study differentiable strongly quasiconvex functions for providing new properties for algorithmic and monotonicity purposes. Furthemore, we provide insights into the decreasing behaviour of strongly quasiconvex functions, applying this for establishing exponential convergence for first- and second-order gradient systems without relying on the usual Lipschitz continuity assumption on the gradient of the function. The explicit discretization of the first-order dynamical system leads to the gradient descent method while discretization of the second-order dynamical system with viscous damping recovers the heavy ball method. We establish the linear convergence of both methods under suitable conditions on the parameters as well as comparisons with other classes of nonconvex functions used in the gradient descent literature.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Trickle-Down in Localization Schemes and Applications
Authors:
Nima Anari,
Frederic Koehler,
Thuy-Duong Vuong
Abstract:
Trickle-down is a phenomenon in high-dimensional expanders with many important applications -- for example, it is a key ingredient in various constructions of high-dimensional expanders or the proof of rapid mixing for the basis exchange walk on matroids and in the analysis of log-concave polynomials. We formulate a generalized trickle-down equation in the abstract context of linear-tilt localizat…
▽ More
Trickle-down is a phenomenon in high-dimensional expanders with many important applications -- for example, it is a key ingredient in various constructions of high-dimensional expanders or the proof of rapid mixing for the basis exchange walk on matroids and in the analysis of log-concave polynomials. We formulate a generalized trickle-down equation in the abstract context of linear-tilt localization schemes. Building on this generalization, we improve the best-known results for several Markov chain mixing or sampling problems -- for example, we improve the threshold up to which Glauber dynamics is known to mix rapidly in the Sherrington-Kirkpatrick spin glass model. Other applications of our framework include improved mixing results for the Langevin dynamics in the $O(N)$ model, and near-linear time sampling algorithms for the antiferromagnetic and fixed-magnetization Ising models on expanders. For the latter application, we use a new dynamics inspired by polarization, a technique from the theory of stable polynomials.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
A third order dynamical system for generalized monotone equation
Authors:
Pham Viet Hai,
Phan Tu Vuong
Abstract:
We propose a third order dynamical system for solving a nonlinear equation in Hilbert spaces where the operator is cocoercive with respect to the solutions set. Under mild conditions on the parameters, we establish the existence and uniqueness of the generated trajectories as well as its asymptotic convergence to a solution of the equation. When the operator is strongly monotone with respect to th…
▽ More
We propose a third order dynamical system for solving a nonlinear equation in Hilbert spaces where the operator is cocoercive with respect to the solutions set. Under mild conditions on the parameters, we establish the existence and uniqueness of the generated trajectories as well as its asymptotic convergence to a solution of the equation. When the operator is strongly monotone with respect to the solutions set, we deliver an exponential convergence rate of $e^{-2t}$, which is significantly faster than the known results of second order dynamical systems. In particular, for convex optimization problems, the proposed dynamical system provides a fast convergence rate of {\bf $\mathcal{O}(\frac{1}{t^3})$} for the objective values. In addition, we discuss the applications of the proposed dynamical system to several splitting monotone inclusion problems.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
The Boosted Difference of Convex Functions Algorithm for Value-at-Risk Constrained Portfolio Optimization
Authors:
Marah-Lisanne Thormann,
Phan Tu Vuong,
Alain B. Zemkoho
Abstract:
A highly relevant problem of modern finance is the design of Value-at-Risk (VaR) optimal portfolios. Due to contemporary financial regulations, banks and other financial institutions are tied to use the risk measure to control their credit, market and operational risks. For a portfolio with a discrete return distribution and finitely many scenarios, a Difference of Convex (DC) functions representa…
▽ More
A highly relevant problem of modern finance is the design of Value-at-Risk (VaR) optimal portfolios. Due to contemporary financial regulations, banks and other financial institutions are tied to use the risk measure to control their credit, market and operational risks. For a portfolio with a discrete return distribution and finitely many scenarios, a Difference of Convex (DC) functions representation of the VaR can be derived. Wozabal (2012) showed that this yields a solution to a VaR constrained Markowitz style portfolio selection problem using the Difference of Convex Functions Algorithm (DCA). A recent algorithmic extension is the so-called Boosted Difference of Convex Functions Algorithm (BDCA) which accelerates the convergence due to an additional line search step. It has been shown that the BDCA converges linearly for solving non-smooth quadratic problems with linear inequality constraints. In this paper, we prove that the linear rate of convergence is also guaranteed for a piecewise linear objective function with linear equality and inequality constraints using the Kurdyka-Łojasiewicz property. An extended case study under consideration of best practices for comparing optimization algorithms demonstrates the superiority of the BDCA over the DCA for real-world financial market data. We are able to show that the results of the BDCA are significantly closer to the efficient frontier compared to the DCA. Due to the open availability of all data sets and code, this paper further provides a practical guide for transparent and easily reproducible comparisons of VaR constrained portfolio selection problems in Python.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Fast parallel sampling under isoperimetry
Authors:
Nima Anari,
Sinho Chewi,
Thuy-Duong Vuong
Abstract:
We show how to sample in parallel from a distribution $π$ over $\mathbb R^d$ that satisfies a log-Sobolev inequality and has a smooth log-density, by parallelizing the Langevin (resp. underdamped Langevin) algorithms. We show that our algorithm outputs samples from a distribution $\hatπ$ that is close to $π$ in Kullback--Leibler (KL) divergence (resp. total variation (TV) distance), while using on…
▽ More
We show how to sample in parallel from a distribution $π$ over $\mathbb R^d$ that satisfies a log-Sobolev inequality and has a smooth log-density, by parallelizing the Langevin (resp. underdamped Langevin) algorithms. We show that our algorithm outputs samples from a distribution $\hatπ$ that is close to $π$ in Kullback--Leibler (KL) divergence (resp. total variation (TV) distance), while using only $\log(d)^{O(1)}$ parallel rounds and $\widetilde{O}(d)$ (resp. $\widetilde O(\sqrt d)$) gradient evaluations in total. This constitutes the first parallel sampling algorithms with TV distance guarantees.
For our main application, we show how to combine the TV distance guarantees of our algorithms with prior works and obtain RNC sampling-to-counting reductions for families of discrete distribution on the hypercube $\{\pm 1\}^n$ that are closed under exponential tilts and have bounded covariance. Consequently, we obtain an RNC sampler for directed Eulerian tours and asymmetric determinantal point processes, resolving open questions raised in prior works.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Fairness in Submodular Maximization over a Matroid Constraint
Authors:
Marwa El Halabi,
Jakub Tarnawski,
Ashkan Norouzi-Fard,
Thuy-Duong Vuong
Abstract:
Submodular maximization over a matroid constraint is a fundamental problem with various applications in machine learning. Some of these applications involve decision-making over datapoints with sensitive attributes such as gender or race. In such settings, it is crucial to guarantee that the selected solution is fairly distributed with respect to this attribute. Recently, fairness has been investi…
▽ More
Submodular maximization over a matroid constraint is a fundamental problem with various applications in machine learning. Some of these applications involve decision-making over datapoints with sensitive attributes such as gender or race. In such settings, it is crucial to guarantee that the selected solution is fairly distributed with respect to this attribute. Recently, fairness has been investigated in submodular maximization under a cardinality constraint in both the streaming and offline settings, however the more general problem with matroid constraint has only been considered in the streaming setting and only for monotone objectives. This work fills this gap. We propose various algorithms and impossibility results offering different trade-offs between quality, fairness, and generality.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
The Boosted DC Algorithm for Clustering with Constraints
Authors:
Tuyen Tran,
Kate Figenschou,
Phan Tu Vuong
Abstract:
This paper aims to investigate the effectiveness of the recently proposed Boosted Difference of Convex functions Algorithm (BDCA) when applied to clustering with constraints and set clustering with constraints problems. This is the first paper to apply BDCA to a problem with nonlinear constraints. We present the mathematical basis for the BDCA and Difference of Convex functions Algorithm (DCA), al…
▽ More
This paper aims to investigate the effectiveness of the recently proposed Boosted Difference of Convex functions Algorithm (BDCA) when applied to clustering with constraints and set clustering with constraints problems. This is the first paper to apply BDCA to a problem with nonlinear constraints. We present the mathematical basis for the BDCA and Difference of Convex functions Algorithm (DCA), along with a penalty method based on distance functions. We then develop algorithms for solving these problems and computationally implement them, with publicly available implementations. We compare old examples and provide new experiments to test the algorithms. We find that the BDCA method converges in fewer iterations than the corresponding DCA-based method. In addition, BDCA yields faster CPU running-times in all tested problems.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization
Authors:
Frederic Koehler,
Thuy-Duong Vuong
Abstract:
There is a long history, as well as a recent explosion of interest, in statistical and generative modeling approaches based on score functions -- derivatives of the log-likelihood of a distribution. In seminal works, Hyvärinen proposed vanilla score matching as a way to learn distributions from data by computing an estimate of the score function of the underlying ground truth, and established conn…
▽ More
There is a long history, as well as a recent explosion of interest, in statistical and generative modeling approaches based on score functions -- derivatives of the log-likelihood of a distribution. In seminal works, Hyvärinen proposed vanilla score matching as a way to learn distributions from data by computing an estimate of the score function of the underlying ground truth, and established connections between this method and established techniques like Contrastive Divergence and Pseudolikelihood estimation. It is by now well-known that vanilla score matching has significant difficulties learning multimodal distributions. Although there are various ways to overcome this difficulty, the following question has remained unanswered -- is there a natural way to sample multimodal distributions using just the vanilla score? Inspired by a long line of related experimental works, we prove that the Langevin diffusion with early stopping, initialized at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions (mixtures of log-concave distributions).
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses
Authors:
Nima Anari,
Vishesh Jain,
Frederic Koehler,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
We study Glauber dynamics for sampling from discrete distributions $μ$ on the hypercube $\{\pm 1\}^n$. Recently, techniques based on spectral independence have successfully yielded optimal $O(n)$ relaxation times for a host of different distributions $μ$. We show that spectral independence is universal: a relaxation time of $O(n)$ implies spectral independence.
We then study a notion of tractabi…
▽ More
We study Glauber dynamics for sampling from discrete distributions $μ$ on the hypercube $\{\pm 1\}^n$. Recently, techniques based on spectral independence have successfully yielded optimal $O(n)$ relaxation times for a host of different distributions $μ$. We show that spectral independence is universal: a relaxation time of $O(n)$ implies spectral independence.
We then study a notion of tractability for $μ$, defined in terms of smoothness of the multilinear extension of its Hamiltonian -- $\log μ$ -- over $[-1,+1]^n$. We show that Glauber dynamics has relaxation time $O(n)$ for such $μ$, and using the universality of spectral independence, we conclude that these distributions are also fractionally log-concave and consequently satisfy modified log-Sobolev inequalities. We sharpen our estimates and obtain approximate tensorization of entropy and the optimal $\widetilde{O}(n)$ mixing time for random Hamiltonians, i.e. the classically studied mixed $p$-spin model at sufficiently high temperature. These results have significant downstream consequences for concentration of measure, statistical testing, and learning.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Optimal mixing of the down-up walk on independent sets of a given size
Authors:
Vishesh Jain,
Marcus Michelen,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
Let $G$ be a graph on $n$ vertices of maximum degree $Δ$. We show that, for any $δ> 0$, the down-up walk on independent sets of size $k \leq (1-δ)α_c(Δ)n$ mixes in time $O_{Δ,δ}(k\log{n})$, thereby resolving a conjecture of Davies and Perkins in an optimal form. Here, $α_{c}(Δ)n$ is the NP-hardness threshold for the problem of counting independent sets of a given size in a graph on $n$ vertices of…
▽ More
Let $G$ be a graph on $n$ vertices of maximum degree $Δ$. We show that, for any $δ> 0$, the down-up walk on independent sets of size $k \leq (1-δ)α_c(Δ)n$ mixes in time $O_{Δ,δ}(k\log{n})$, thereby resolving a conjecture of Davies and Perkins in an optimal form. Here, $α_{c}(Δ)n$ is the NP-hardness threshold for the problem of counting independent sets of a given size in a graph on $n$ vertices of maximum degree $Δ$. Our mixing time has optimal dependence on $k,n$ for the entire range of $k$; previously, even polynomial mixing was not known. In fact, for $k = Ω_Δ(n)$ in this range, we establish a log-Sobolev inequality with optimal constant $Ω_{Δ,δ}(1/n)$.
At the heart of our proof are three new ingredients, which may be of independent interest. The first is a method for lifting $\ell_\infty$-independence from a suitable distribution on the discrete cube -- in this case, the hard-core model -- to the slice by proving stability of an Edgeworth expansion using a multivariate zero-free region for the base distribution. The second is a generalization of the Lee-Yau induction to prove log-Sobolev inequalities for distributions on the slice with considerably less symmetry than the uniform distribution. The third is a sharp decomposition-type result which provides a lossless comparison between the Dirichlet form of the original Markov chain and that of the so-called projected chain in the presence of a contractive coupling.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
Authors:
Nima Anari,
Yang P. Liu,
Thuy-Duong Vuong
Abstract:
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a d…
▽ More
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $\widetilde{O}(k^ω)$ time after an initial $\widetilde{O}(nk^{ω-1})$ time preprocessing, where $ω<2.372864$ is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^ω\})$ to $\widetilde{O}(nk^{ω-1})$.
In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $μ$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, we can can achieve the optimal $t=\widetilde{O}(k)$. Our reduction involves sampling from $\widetilde{O}(1)$ domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of $μ$. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lovász-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures.
△ Less
Submitted 18 September, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Dimension reduction for maximum matchings and the Fastest Mixing Markov Chain
Authors:
Vishesh Jain,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
Let $G = (V,E)$ be an undirected graph with maximum degree $Δ$ and vertex conductance $Ψ^*(G)$. We show that there exists a symmetric, stochastic matrix $P$, with off-diagonal entries supported on $E$, whose spectral gap $γ^*(P)$ satisfies \[Ψ^*(G)^{2}/\logΔ\lesssim γ^*(P) \lesssim Ψ^*(G).\] Our bound is optimal under the Small Set Expansion Hypothesis, and answers a question of Olesker-Taylor and…
▽ More
Let $G = (V,E)$ be an undirected graph with maximum degree $Δ$ and vertex conductance $Ψ^*(G)$. We show that there exists a symmetric, stochastic matrix $P$, with off-diagonal entries supported on $E$, whose spectral gap $γ^*(P)$ satisfies \[Ψ^*(G)^{2}/\logΔ\lesssim γ^*(P) \lesssim Ψ^*(G).\] Our bound is optimal under the Small Set Expansion Hypothesis, and answers a question of Olesker-Taylor and Zanetti, who obtained such a result with $\logΔ$ replaced by $\log|V|$.
In order to obtain our result, we show how to embed a negative-type semi-metric $d$ defined on $V$ into a negative-type semi-metric $d'$ supported in $\mathbb{R}^{O(\logΔ)}$, such that the (fractional) matching number of the weighted graph $(V,E,d)$ is approximately equal to that of $(V,E,d')$.
△ Less
Submitted 23 March, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Entropic Independence II: Optimal Sampling and Concentration via Restricted Modified Log-Sobolev Inequalities
Authors:
Nima Anari,
Vishesh Jain,
Frederic Koehler,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
We introduce a framework for obtaining tight mixing times for Markov chains based on what we call restricted modified log-Sobolev inequalities. Modified log-Sobolev inequalities (MLSI) quantify the rate of relative entropy contraction for the Markov operator, and are notoriously difficult to establish. However, infinitesimally close to stationarity, entropy contraction becomes equivalent to varian…
▽ More
We introduce a framework for obtaining tight mixing times for Markov chains based on what we call restricted modified log-Sobolev inequalities. Modified log-Sobolev inequalities (MLSI) quantify the rate of relative entropy contraction for the Markov operator, and are notoriously difficult to establish. However, infinitesimally close to stationarity, entropy contraction becomes equivalent to variance contraction, a.k.a. a Poincare inequality, which is significantly easier to establish through, e.g., spectral analysis. Motivated by this observation, we study restricted modified log-Sobolev inequalities that guarantee entropy contraction not for all starting distributions, but for those in a large neighborhood of the stationary distribution. We show how to sample from the hardcore and Ising models on $n$-node graphs that have a constant $δ$ relative gap to the tree-uniqueness threshold, in nearly-linear time $\widetilde O_δ(n)$. Notably, our bound does not depend on the maximum degree $Δ$, and is therefore optimal even for high-degree graphs. This improves on prior mixing time bounds of $\widetilde O_{δ, Δ}(n)$ and $\widetilde O_δ(n^2)$, established via (non-restricted) modified log-Sobolev and Poincare inequalities respectively. We further show that optimal concentration inequalities can still be achieved from the restricted form of modified log-Sobolev inequalities. To establish restricted entropy contraction, we extend the entropic independence framework of Anari, Jain, Koehler, Pham, and Vuong to the setting of distributions that are spectrally independent under a restricted set of external fields. We also develop an orthogonal trick that might be of independent interest: utilizing Bernoulli factories we show how to implement Glauber dynamics updates on high-degree graphs in $O(1)$ time, assuming standard adjacency array representation of the graph.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Domain Sparsification of Discrete Distributions using Entropic Independence
Authors:
Nima Anari,
Michał Dereziński,
Thuy-Duong Vuong,
Elizabeth Yang
Abstract:
We present a framework for speeding up the time it takes to sample from discrete distributions $μ$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim μ}[i\in S]$, the task of sampling from $μ$ can be reduced to sampling from distributions $ν$ supported on size $k$ subsets of a ground set of only…
▽ More
We present a framework for speeding up the time it takes to sample from discrete distributions $μ$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim μ}[i\in S]$, the task of sampling from $μ$ can be reduced to sampling from distributions $ν$ supported on size $k$ subsets of a ground set of only $n^{1-α}\cdot \operatorname{poly}(k)$ elements. Here, $1/α\in [1, k]$ is the parameter of entropic independence for $μ$. Further, the sparsified distributions $ν$ are obtained by applying a sparse (mostly $0$) external field to $μ$, an operation that often retains algorithmic tractability of sampling from $ν$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $μ$, and in return reduce the amortized cost needed to produce many samples from the distribution $μ$, as is often needed in upstream tasks such as counting and inference.
For a wide range of distributions where $α=Ω(1)$, our result reduces the domain size, and as a corollary, the cost-per-sample, by a $\operatorname{poly}(n)$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Dereziński who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $α=1$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.
△ Less
Submitted 14 September, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Stochastic Relaxed Inertial Forward-Backward-Forward splitting for Monotone Inclusions in Hilbert spaces
Authors:
Shisheng Cui,
Uday V. Shanbhag,
Mathias Staudigl,
Phan Tu Vuong
Abstract:
We consider monotone inclusions defined on a Hilbert space where the operator is given by the sum of a maximal monotone operator $T$ and a single-valued monotone, Lipschitz continuous, and expectation-valued operator $V$. We draw motivation from the seminal work by Attouch and Cabot on relaxed inertial methods for monotone inclusions and present a stochastic extension of the relaxed inertial forwa…
▽ More
We consider monotone inclusions defined on a Hilbert space where the operator is given by the sum of a maximal monotone operator $T$ and a single-valued monotone, Lipschitz continuous, and expectation-valued operator $V$. We draw motivation from the seminal work by Attouch and Cabot on relaxed inertial methods for monotone inclusions and present a stochastic extension of the relaxed inertial forward-backward-forward (RISFBF) method. Facilitated by an online variance reduction strategy via a mini-batch approach, we show that (RISFBF) produces a sequence that weakly converges to the solution set. Moreover, it is possible to estimate the rate at which the discrete velocity of the stochastic process vanishes. Under strong monotonicity, we demonstrate strong convergence, and give a detailed assessment of the iteration and oracle complexity of the scheme. When the mini-batch is raised at a geometric (polynomial) rate, the rate statement can be strengthened to a linear (suitable polynomial) rate while the oracle complexity of computing an $ε$-solution improves to $O(1/ε)$. Importantly, the latter claim allows for possibly biased oracles, a key theoretical advancement allowing for far broader applicability. By defining a restricted gap function based on the Fitzpatrick function, we prove that the expected gap of an averaged sequence diminishes at a sublinear rate of $O(1/k)$ while the oracle complexity of computing a suitably defined $ε$-solution is $O(1/ε^{1+a})$ where $a>1$. Numerical results on two-stage games and an overlapping group Lasso problem illustrate the advantages of our method compared to stochastic forward-backward-forward (SFBF) and SA schemes.
△ Less
Submitted 2 August, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Entropic Independence I: Modified Log-Sobolev Inequalities for Fractionally Log-Concave Distributions and High-Temperature Ising Models
Authors:
Nima Anari,
Vishesh Jain,
Frederic Koehler,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $μ$ on $k$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $S$, the relative entropy of a single element of $S$ drawn uniformly at random carries at most $O(1/k)$ fr…
▽ More
We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $μ$ on $k$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $S$, the relative entropy of a single element of $S$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $S$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence.
To derive our results, we relate entropic independence to properties of polynomials: $μ$ is entropically independent exactly when a transformed version of the generating polynomial of $μ$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions. As our flagship application, we establish the tight mixing time of $O(n\log n)$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $1$, improving upon the prior quadratic dependence on $n$.
△ Less
Submitted 4 November, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Spectral independence, coupling with the stationary distribution, and the spectral gap of the Glauber dynamics
Authors:
Vishesh Jain,
Huy Tuan Pham,
Thuy Duong Vuong
Abstract:
We present a new lower bound on the spectral gap of the Glauber dynamics for the Gibbs distribution of a spectrally independent $q$-spin system on a graph $G = (V,E)$ with maximum degree $Δ$. Notably, for several interesting examples, our bound covers the entire regime of $Δ$ excluded by arguments based on coupling with the stationary distribution. As concrete applications, by combining our new lo…
▽ More
We present a new lower bound on the spectral gap of the Glauber dynamics for the Gibbs distribution of a spectrally independent $q$-spin system on a graph $G = (V,E)$ with maximum degree $Δ$. Notably, for several interesting examples, our bound covers the entire regime of $Δ$ excluded by arguments based on coupling with the stationary distribution. As concrete applications, by combining our new lower bound with known spectral independence computations and known coupling arguments:
(1) We show that for a triangle-free graph $G = (V,E)$ with maximum degree $Δ\geq 3$, the Glauber dynamics for the uniform distribution on proper $k$-colorings with $k \geq (1.763\dots + δ)Δ$ colors has spectral gap $\tildeΩ_δ(|V|^{-1})$. Previously, such a result was known either if the girth of $G$ is at least $5$ [Dyer et.~al, FOCS 2004], or under restrictions on $Δ$ [Chen et.~al, STOC 2021; Hayes-Vigoda, FOCS 2003].
(2) We show that for a regular graph $G = (V,E)$ with degree $Δ\geq 3$ and girth at least $6$, and for any $\varepsilon, δ> 0$, the partition function of the hardcore model with fugacity $λ\leq (1-δ)λ_{c}(Δ)$ may be approximated within a $(1+\varepsilon)$-multiplicative factor in time $\tilde{O}_δ(n^{2}\varepsilon^{-2})$. Previously, such a result was known if the girth is at least $7$ [Efthymiou et.~al, SICOMP 2019].
(3) We show for the binomial random graph $G(n,d/n)$ with $d = O(1)$, with high probability, an approximately uniformly random matching may be sampled in time $O_{d}(n^{2+o(1)})$. This improves the corresponding running time of $\tilde{O}_{d}(n^{3})$ due to [Jerrum-Sinclair, SICOMP 1989; Jerrum, 2003].
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
On the sampling Lovász Local Lemma for atomic constraint satisfaction problems
Authors:
Vishesh Jain,
Huy Tuan Pham,
Thuy-Duong Vuong
Abstract:
We study the problem of sampling an approximately uniformly random satisfying assignment for atomic constraint satisfaction problems i.e. where each constraint is violated by only one assignment to its variables. Let $p$ denote the maximum probability of violation of any constraint and let $Δ$ denote the maximum degree of the line graph of the constraints.
Our main result is a nearly-linear (in…
▽ More
We study the problem of sampling an approximately uniformly random satisfying assignment for atomic constraint satisfaction problems i.e. where each constraint is violated by only one assignment to its variables. Let $p$ denote the maximum probability of violation of any constraint and let $Δ$ denote the maximum degree of the line graph of the constraints.
Our main result is a nearly-linear (in the number of variables) time algorithm for this problem, which is valid in a Lovász local lemma type regime that is considerably less restrictive compared to previous works. In particular, we provide sampling algorithms for the uniform distribution on:
(1) $q$-colorings of $k$-uniform hypergraphs with $Δ\lesssim q^{(k-4)/3 + o_{q}(1)}.$
The exponent $1/3$ improves the previously best-known $1/7$ in the case $q, Δ= O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $1/9$ in the general case [Feng, He, Yin; STOC 2021].
(2) Satisfying assignments of Boolean $k$-CNF formulas with $Δ\lesssim 2^{k/5.741}.$
The constant $5.741$ in the exponent improves the previously best-known $7$ in the case $k = O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $13$ in the general case [Feng, He, Yin; STOC 2021].
(3) Satisfying assignments of general atomic constraint satisfaction problems with $p\cdot Δ^{7.043} \lesssim 1.$
The constant $7.043$ improves upon the previously best-known constant of $350$ [Feng, He, Yin; STOC 2021].
At the heart of our analysis is a novel information-percolation type argument for showing the rapid mixing of the Glauber dynamics for a carefully constructed projection of the uniform distribution on satisfying assignments. Notably, there is no natural partial order on the space, and we believe that the techniques developed for the analysis may be of independent interest.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Fractionally Log-Concave and Sector-Stable Polynomials: Counting Planar Matchings and More
Authors:
Yeganeh Alimohammadi,
Nima Anari,
Kirankumar Shiragur,
Thuy-Duong Vuong
Abstract:
We show fully polynomial time randomized approximation schemes (FPRAS) for counting matchings of a given size, or more generally sampling/counting monomer-dimer systems in planar, not-necessarily-bipartite, graphs. While perfect matchings on planar graphs can be counted exactly in polynomial time, counting non-perfect matchings was shown by [Jer87] to be #P-hard, who also raised the question of wh…
▽ More
We show fully polynomial time randomized approximation schemes (FPRAS) for counting matchings of a given size, or more generally sampling/counting monomer-dimer systems in planar, not-necessarily-bipartite, graphs. While perfect matchings on planar graphs can be counted exactly in polynomial time, counting non-perfect matchings was shown by [Jer87] to be #P-hard, who also raised the question of whether efficient approximate counting is possible. We answer this affirmatively by showing that the multi-site Glauber dynamics on the set of monomers in a monomer-dimer system always mixes rapidly, and that this dynamics can be implemented efficiently on downward-closed families of graphs where counting perfect matchings is tractable. As further applications of our results, we show how to sample efficiently using multi-site Glauber dynamics from partition-constrained strongly Rayleigh distributions, and nonsymmetric determinantal point processes.
In order to analyze mixing properties of the multi-site Glauber dynamics, we establish two notions for generating polynomials of discrete set-valued distributions: sector-stability and fractional log-concavity. These notions generalize well-studied properties like real-stability and log-concavity, but unlike them robustly degrade under useful transformations applied to the distribution. We relate these notions to pairwise correlations in the underlying distribution and the notion of spectral independence introduced by [ALO20], providing a new tool for establishing spectral independence based on geometry of polynomials. As a byproduct of our techniques, we show that polynomials avoiding roots in a sector of the complex plane must satisfy what we call fractional log-concavity; this extends a classic result established by [Gar59] who showed homogeneous polynomials that have no roots in a half-plane must be log-concave over the positive orthant.
△ Less
Submitted 3 April, 2023; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Towards the sampling Lovász Local Lemma
Authors:
Vishesh Jain,
Huy Tuan Pham,
Thuy Duong Vuong
Abstract:
Let $Φ= (V, \mathcal{C})$ be a constraint satisfaction problem on variables $v_1,\dots, v_n$ such that each constraint depends on at most $k$ variables and such that each variable assumes values in an alphabet of size at most $[q]$. Suppose that each constraint shares variables with at most $Δ$ constraints and that each constraint is violated with probability at most $p$ (under the product measure…
▽ More
Let $Φ= (V, \mathcal{C})$ be a constraint satisfaction problem on variables $v_1,\dots, v_n$ such that each constraint depends on at most $k$ variables and such that each variable assumes values in an alphabet of size at most $[q]$. Suppose that each constraint shares variables with at most $Δ$ constraints and that each constraint is violated with probability at most $p$ (under the product measure on its variables). We show that for $k, q = O(1)$, there is a deterministic, polynomial time algorithm to approximately count the number of satisfying assignments and a randomized, polynomial time algorithm to sample from approximately the uniform distribution on satisfying assignments, provided that \[C\cdot q^{3}\cdot k \cdot p \cdot Δ^{7} < 1, \quad \text{where }C \text{ is an absolute constant.}\] Previously, a result of this form was known essentially only in the special case when each constraint is violated by exactly one assignment to its variables.
For the special case of $k$-CNF formulas, the term $Δ^{7}$ improves the previously best known $Δ^{60}$ for deterministic algorithms [Moitra, J.ACM, 2019] and $Δ^{13}$ for randomized algorithms [Feng et al., arXiv, 2020]. For the special case of properly $q$-coloring $k$-uniform hypergraphs, the term $Δ^{7}$ improves the previously best known $Δ^{14}$ for deterministic algorithms [Guo et al., SICOMP, 2019] and $Δ^{9}$ for randomized algorithms [Feng et al., arXiv, 2020].
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Computing Dynamic User Equilibrium on Large-Scale Networks Without Knowing Global Parameters
Authors:
Duong Viet Thong,
Aviv Gibali,
Mathias Staudigl,
Phan Tu Vuong
Abstract:
Dynamic user equilibrium (DUE) is a Nash-like solution concept describing an equilibrium in dynamic traffic systems over a fixed planning period. DUE is a challenging class of equilibrium problems, connecting network loading models and notions of system equilibrium in one concise mathematical framework. Recently, Friesz and Han introduced an integrated framework for DUE computation on large-scale…
▽ More
Dynamic user equilibrium (DUE) is a Nash-like solution concept describing an equilibrium in dynamic traffic systems over a fixed planning period. DUE is a challenging class of equilibrium problems, connecting network loading models and notions of system equilibrium in one concise mathematical framework. Recently, Friesz and Han introduced an integrated framework for DUE computation on large-scale networks, featuring a basic fixed-point algorithm for the effective computation of DUE. In the same work, they present an open-source MATLAB toolbox which allows researchers to test and validate new numerical solvers. This paper builds on this seminal contribution, and extends it in several important ways. At a conceptual level, we provide new strongly convergent algorithms designed to compute a DUE directly in the infinite-dimensional space of path flows. An important feature of our algorithms is that they give provable convergence guarantees without knowledge of global parameters. In fact, the algorithms we propose are adaptive, in the sense that they do not need a priori knowledge of global parameters of the delay operator, and which are provable convergent even for delay operators which are non-monotone. We implement our numerical schemes on standard test instances, and compare them with the numerical solution strategy employed by Friesz and Han.
△ Less
Submitted 15 June, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Log-Concave Polynomials IV: Approximate Exchange, Tight Mixing Times, and Near-Optimal Sampling of Forests
Authors:
Nima Anari,
Kuikui Liu,
Shayan Oveis Gharan,
Cynthia Vinzant,
Thuy Duong Vuong
Abstract:
We prove tight mixing time bounds for natural random walks on bases of matroids, determinantal distributions, and more generally distributions associated with log-concave polynomials. For a matroid of rank $k$ on a ground set of $n$ elements, or more generally distributions associated with log-concave polynomials of homogeneous degree $k$ on $n$ variables, we show that the down-up random walk, sta…
▽ More
We prove tight mixing time bounds for natural random walks on bases of matroids, determinantal distributions, and more generally distributions associated with log-concave polynomials. For a matroid of rank $k$ on a ground set of $n$ elements, or more generally distributions associated with log-concave polynomials of homogeneous degree $k$ on $n$ variables, we show that the down-up random walk, started from an arbitrary point in the support, mixes in time $O(k\log k)$. Our bound has no dependence on $n$ or the starting point, unlike the previous analyses [ALOV19,CGM19], and is tight up to constant factors. The main new ingredient is a property we call approximate exchange, a generalization of well-studied exchange properties for matroids and valuated matroids, which may be of independent interest. In particular, given function $μ: {[n] \choose k} \to \mathbb{R}_{\geq 0},$ our approximate exchange property implies that a simple local search algorithm gives a $k^{O(k)}$-approximation of $\max_{S} μ(S)$ when $μ$ is generated by a log-concave polynomial, and that greedy gives the same approximation ratio when $μ$ is strongly Rayleigh.
As an application, we show how to leverage down-up random walks to approximately sample random forests or random spanning trees in a graph with $n$ edges in time $O(n\log^2 n).$ The best known result for sampling random forest was a FPAUS with high polynomial runtime recently found by \cite{ALOV19, CGM19}. For spanning tree, we improve on the almost-linear time algorithm by [Sch18]. Our analysis works on weighted graphs too, and is the first to achieve nearly-linear running time for these problems.
△ Less
Submitted 11 April, 2021; v1 submitted 15 April, 2020;
originally announced April 2020.
-
A Relaxed Inertial Forward-Backward-Forward Algorithm for Solving Monotone Inclusions with Application to GANs
Authors:
Radu Ioan Bot,
Michael Sedlmayer,
Phan Tu Vuong
Abstract:
We introduce a relaxed inertial forward-backward-forward (RIFBF) splitting algorithm for approaching the set of zeros of the sum of a maximally monotone operator and a single-valued monotone and Lipschitz continuous operator. This work aims to extend Tseng's forward-backward-forward method by both using inertial effects as well as relaxation parameters. We formulate first a second order dynamical…
▽ More
We introduce a relaxed inertial forward-backward-forward (RIFBF) splitting algorithm for approaching the set of zeros of the sum of a maximally monotone operator and a single-valued monotone and Lipschitz continuous operator. This work aims to extend Tseng's forward-backward-forward method by both using inertial effects as well as relaxation parameters. We formulate first a second order dynamical system which approaches the solution set of the monotone inclusion problem to be solved and provide an asymptotic analysis for its trajectories. We provide for RIFBF, which follows by explicit time discretization, a convergence analysis in the general monotone case as well as when applied to the solving of pseudo-monotone variational inequalities. We illustrate the proposed method by applications to a bilinear saddle point problem, in the context of which we also emphasize the interplay between the inertial and the relaxation parameters, and to the training of Generative Adversarial Networks (GANs).
△ Less
Submitted 22 March, 2020; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Strong Convergence of Forward-Backward-Forward Methods for Pseudo-monotone Variational Inequalities with Applications to Dynamic User Equilibrium in Traffic Networks
Authors:
Benoit Duvocelle,
Dennis Meier,
Mathias Staudigl,
Phan Tu Vuong
Abstract:
In infinite-dimensional Hilbert spaces we device a class of strongly convergent primal-dual schemes for solving variational inequalities defined by a Lipschitz continuous and pseudomonote map. Our novel numerical scheme is based on Tseng's forward-backward-forward scheme, which is known to display weak convergence, unless very strong global monotonicity assumptions are made on the involved operato…
▽ More
In infinite-dimensional Hilbert spaces we device a class of strongly convergent primal-dual schemes for solving variational inequalities defined by a Lipschitz continuous and pseudomonote map. Our novel numerical scheme is based on Tseng's forward-backward-forward scheme, which is known to display weak convergence, unless very strong global monotonicity assumptions are made on the involved operators. We provide a simple augmentation of this algorithm which is computationally cheap and still guarantees strong convergence to a minimal norm solution of the underlying problem. We provide an adaptive extension of the algorithm, freeing us from requiring knowledge of the global Lipschitz constant. We test the performance of the algorithm in the computationally challenging task to find dynamic user equilibria in traffic networks and verify that our scheme is at least competitive to state-of-the-art solvers, and in some case even improve upon them.
△ Less
Submitted 25 August, 2019; v1 submitted 20 August, 2019;
originally announced August 2019.
-
The Boosted DC Algorithm for linearly constrained DC programming
Authors:
Francisco J. Aragón Artacho,
Rubén Campoy,
Phan T. Vuong
Abstract:
The Boosted Difference of Convex functions Algorithm (BDCA) has been recently introduced to accelerate the performance of the classical Difference of Convex functions Algorithm (DCA). This acceleration is achieved thanks to an extrapolation step from the point computed by DCA via a line search procedure. In this work, we propose an extension of BDCA that can be applied to difference of convex func…
▽ More
The Boosted Difference of Convex functions Algorithm (BDCA) has been recently introduced to accelerate the performance of the classical Difference of Convex functions Algorithm (DCA). This acceleration is achieved thanks to an extrapolation step from the point computed by DCA via a line search procedure. In this work, we propose an extension of BDCA that can be applied to difference of convex functions programs with linear constraints, and prove that every cluster point of the sequence generated by this algorithm is a Karush--Kuhn-Tucker point of the problem if the feasible set has a Slater point. When the objective function is quadratic, we prove that any sequence generated by the algorithm is bounded and R-linearly (geometrically) convergent. Finally, we present some numerical experiments where we compare the performance of DCA and BDCA on some challenging problems: to test the copositivity of a given matrix, to solve one-norm and infinity-norm trust-region subproblems, and to solve piecewise quadratic problems with box constraints. Our numerical results demonstrate that this new extension of BDCA outperforms DCA.
△ Less
Submitted 2 August, 2022; v1 submitted 3 August, 2019;
originally announced August 2019.
-
Using positive spanning sets to achieve d-stationarity with the Boosted DC Algorithm
Authors:
Francisco J. Aragón Artacho,
Rubén Campoy,
Phan T. Vuong
Abstract:
The Difference of Convex functions Algorithm (DCA) is widely used for minimizing the difference of two convex functions. A recently proposed accelerated version, termed BDCA for Boosted DC Algorithm, incorporates a line search step to achieve a larger decrease of the objective value at each iteration. Thanks to this step, BDCA usually converges much faster than DCA in practice. The solutions found…
▽ More
The Difference of Convex functions Algorithm (DCA) is widely used for minimizing the difference of two convex functions. A recently proposed accelerated version, termed BDCA for Boosted DC Algorithm, incorporates a line search step to achieve a larger decrease of the objective value at each iteration. Thanks to this step, BDCA usually converges much faster than DCA in practice. The solutions found by DCA are guaranteed to be critical points of the problem, but these may not be local minima. Although BDCA tends to improve the objective value of the solutions it finds, these are frequently just critical points as well. In this paper we combine BDCA with a simple Derivative-Free Optimization (DFO) algorithm to force the d-stationarity (lack of descent direction) at the point obtained. The potential of this approach is illustrated through some computational experiments on a Minimum-Sum-of-Squares clustering problem. Our numerical results demonstrate that the new method provides better solutions while still remains faster than DCA in the majority of test cases.
△ Less
Submitted 12 February, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Uncertainty-aware Model-based Policy Optimization
Authors:
Tung-Long Vuong,
Kenneth Tran
Abstract:
Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance compared to model-free counterparts. In addition, they are typically based on the model predictive control (MPC) framework, which not only is computationally ine…
▽ More
Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance compared to model-free counterparts. In addition, they are typically based on the model predictive control (MPC) framework, which not only is computationally inefficient at decision time but also does not enable policy transfer due to the lack of an explicit policy representation. In this paper, we propose a novel uncertainty-aware model-based policy optimization framework which solves those issues. In this framework, the agent simultaneously learns an uncertainty-aware dynamics model and optimizes the policy according to these learned models. In the optimization step, the policy gradient is computed by automatic differentiation through the models. With respect to sample efficiency alone, our approach shows promising results on challenging continuous control benchmarks with competitive asymptotic performance and significantly lower sample complexity than state-of-the-art baselines.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Forward-backward-forward methods with variance reduction for stochastic variational inequalities
Authors:
Radu Ioan Bot,
Panayotis Mertikopoulos,
Mathias Staudigl,
Phan Tu Vuong
Abstract:
We develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng's forward-backward-forward (FBF) algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by pseudo-mo…
▽ More
We develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng's forward-backward-forward (FBF) algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by pseudo-monotone, Lipschitz continuous operators. The main computational advantage of Tseng's algorithm is that it relies only on a single projection step and two independent queries of a stochastic oracle. Our algorithm incorporates a variance reduction mechanism and leads to almost sure (a.s.) convergence to an optimal solution. To the best of our knowledge, this is the first stochastic look-ahead algorithm achieving this by using only a single projection at each iteration..
△ Less
Submitted 8 February, 2019;
originally announced February 2019.
-
The Boosted DC Algorithm for nonsmooth functions
Authors:
Francisco J. Aragón Artacho,
Phan T. Vuong
Abstract:
The Boosted Difference of Convex functions Algorithm (BDCA) was recently proposed for minimizing smooth difference of convex (DC) functions. BDCA accelerates the convergence of the classical Difference of Convex functions Algorithm (DCA) thanks to an additional line search step. The purpose of this paper is twofold. Firstly, to show that this scheme can be generalized and successfully applied to c…
▽ More
The Boosted Difference of Convex functions Algorithm (BDCA) was recently proposed for minimizing smooth difference of convex (DC) functions. BDCA accelerates the convergence of the classical Difference of Convex functions Algorithm (DCA) thanks to an additional line search step. The purpose of this paper is twofold. Firstly, to show that this scheme can be generalized and successfully applied to certain types of nonsmooth DC functions, namely, those that can be expressed as the difference of a smooth function and a possibly nonsmooth one. Secondly, to show that there is complete freedom in the choice of the trial step size for the line search, which is something that can further improve its performance. We prove that any limit point of the BDCA iterative sequence is a critical point of the problem under consideration, and that the corresponding objective value is monotonically decreasing and convergent. The global convergence and convergent rate of the iterations are obtained under the Kurdyka-Lojasiewicz property. Applications and numerical experiments for two problems in data science are presented, demonstrating that BDCA outperforms DCA. Specifically, for the Minimum Sum-of-Squares Clustering problem, BDCA was on average sixteen times faster than DCA, and for the Multidimensional Scaling problem, BDCA was three times faster than DCA.
△ Less
Submitted 23 July, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Finding Zeros of Hölder Metrically Subregular Mappings via Globally Convergent Levenberg-Marquardt Methods
Authors:
Masoud Ahookhosh,
Ronan M. T. Fleming,
Phan T. Vuong
Abstract:
We present two globally convergent Levenberg-Marquardt methods for finding zeros of Hölder metrically subregular mappings that may have non-isolated zeros. The first method unifies the Levenberg- Marquardt direction and an Armijo-type line search, while the second incorporates this direction with a nonmonotone trust-region technique. For both methods, we prove the global convergence to a first-ord…
▽ More
We present two globally convergent Levenberg-Marquardt methods for finding zeros of Hölder metrically subregular mappings that may have non-isolated zeros. The first method unifies the Levenberg- Marquardt direction and an Armijo-type line search, while the second incorporates this direction with a nonmonotone trust-region technique. For both methods, we prove the global convergence to a first-order stationary point of the associated merit function. Furthermore, the worst-case global complexity of these methods are provided, indicating that an approximate stationary point can be computed in at most $\mathcal{O}(\varepsilon^{-2})$ function and gradient evaluations, for an accuracy parameter $\varepsilon>0$. We also study the conditions for the proposed methods to converge to a zero of the associated mappings. Computing a moiety conserved steady state for biochemical reaction networks can be cast as the problem of finding a zero of a Hölder metrically subregular mapping. We report encouraging numerical results for finding a zero of such mappings derived from real-world biological data, which supports our theoretical foundations.
△ Less
Submitted 4 December, 2018; v1 submitted 30 November, 2018;
originally announced December 2018.
-
An inertial extrapolation method for convex simple bilevel optimization
Authors:
Yekini Shehu,
Phan Tu Vuong,
Alain Zemkoho
Abstract:
We consider a scalar objective minimization problem over the solution set of another optimization problem. This problem is known as simple bilevel optimization problem and has drawn a significant attention in the last few years. Our inner problem consists of minimizing the sum of smooth and nonsmooth functions while the outer one is the minimization of a smooth convex function. We propose and esta…
▽ More
We consider a scalar objective minimization problem over the solution set of another optimization problem. This problem is known as simple bilevel optimization problem and has drawn a significant attention in the last few years. Our inner problem consists of minimizing the sum of smooth and nonsmooth functions while the outer one is the minimization of a smooth convex function. We propose and establish the convergence of a fixed-point iterative method with inertial extrapolation to solve the problem. Our numerical experiments show that the method proposed in this paper outperforms the currently best known algorithm to solve the class of problem considered.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
The Forward-Backward-Forward Method from continuous and discrete perspective for pseudo-monotone variational inequalities in Hilbert spaces
Authors:
Radu Ioan Bot,
Ernö Robert Csetnek,
Phan Tu Vuong
Abstract:
Tseng's forward-backward-forward algorithm is a valuable alternative for Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by monotone and Lipschitz continuous operators, as it requires in every step only one projection operation. However, it is well-known that Korpelevich's method converges and can therefore be used also for solving var…
▽ More
Tseng's forward-backward-forward algorithm is a valuable alternative for Korpelevich's extragradient method when solving variational inequalities over a convex and closed set governed by monotone and Lipschitz continuous operators, as it requires in every step only one projection operation. However, it is well-known that Korpelevich's method converges and can therefore be used also for solving variational inequalities governed by pseudo-monotone and Lipschitz continuous operators. In this paper, we first associate to a pseudo-monotone variational inequality a forward-backward-forward dynamical system and carry out an asymptotic analysis for the generated trajectories. The explicit time discretization of this system results into Tseng's forward-backward-forward algorithm with relaxation parameters, which we prove to converge also when it is applied to pseudo-monotone variational inequalities. In addition, we show that linear convergence is guaranteed under strong pseudo-monotonicity. Numerical experiments are carried out for pseudo-monotone variational inequalities over polyhedral sets and fractional programming problems.
△ Less
Submitted 31 July, 2020; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Local convergence of the Levenberg-Marquardt method under Hölder metric subregularity
Authors:
Masoud Ahookhosh,
Francisco J. Aragón Artacho,
Ronan M. T. Fleming,
Phan T. Vuong
Abstract:
We describe and analyse Levenberg-Marquardt methods for solving systems of nonlinear equations. More specifically, we propose an adaptive formula for the Levenberg-Marquardt parameter and analyse the local convergence of the method under Hölder metric subregularity of the function defining the equation and Hölder continuity of its gradient mapping. Further, we analyse the local convergence of the…
▽ More
We describe and analyse Levenberg-Marquardt methods for solving systems of nonlinear equations. More specifically, we propose an adaptive formula for the Levenberg-Marquardt parameter and analyse the local convergence of the method under Hölder metric subregularity of the function defining the equation and Hölder continuity of its gradient mapping. Further, we analyse the local convergence of the method under the additional assumption that the Łojasiewicz gradient inequality holds. We finally report encouraging numerical results confirming the theoretical findings for the problem of computing moiety conserved steady states in biochemical reaction networks. This problem can be cast as finding a solution of a system of nonlinear equations, where the associated mapping satisfies the Łojasiewicz gradient inequality assumption.
△ Less
Submitted 21 February, 2019; v1 submitted 21 March, 2017;
originally announced March 2017.
-
Toric Mutations in the dP$_2$ Quiver and Subgraphs of the dP$_2$ Brane Tiling
Authors:
Yibo Gao,
Zhaoqi Li,
Thuy-Duong Vuong,
Lisa Yang
Abstract:
Brane tilings are infinite, bipartite, periodic, planar graphs that are dual to quivers. In this paper, we examine the del Pezzo 2 (dP$_2$) quiver and its brane tiling, which arise from the physics literature, in terms of toric mutations on its corresponding cluster. Specifically, we give explicit formulas for all cluster variables generated by toric mutation sequences. Moreover, for each such var…
▽ More
Brane tilings are infinite, bipartite, periodic, planar graphs that are dual to quivers. In this paper, we examine the del Pezzo 2 (dP$_2$) quiver and its brane tiling, which arise from the physics literature, in terms of toric mutations on its corresponding cluster. Specifically, we give explicit formulas for all cluster variables generated by toric mutation sequences. Moreover, for each such variable, we associate a subgraph of the dP$_2$ brane tiling to it such that its weight matches the variable.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.
-
Accelerating the DC algorithm for smooth functions
Authors:
Francisco J. Aragón Artacho,
Ronan M. T. Fleming,
Phan T. Vuong
Abstract:
We introduce two new algorithms to minimise smooth difference of convex (DC) functions that accelerate the convergence of the classical DC algorithm (DCA). We prove that the point computed by DCA can be used to define a descent direction for the objective function evaluated at this point. Our algorithms are based on a combination of DCA together with a line search step that uses this descent direc…
▽ More
We introduce two new algorithms to minimise smooth difference of convex (DC) functions that accelerate the convergence of the classical DC algorithm (DCA). We prove that the point computed by DCA can be used to define a descent direction for the objective function evaluated at this point. Our algorithms are based on a combination of DCA together with a line search step that uses this descent direction. Convergence of the algorithms is proved and the rate of convergence is analysed under the Lojasiewicz property of the objective function. We apply our algorithms to a class of smooth DC programs arising in the study of biochemical reaction networks, where the objective function is real analytic and thus satisfies the Lojasiewicz property. Numerical tests on various biochemical models clearly show that our algorithms outperforms DCA, being on average more than four times faster in both computational time and the number of iterations. Numerical experiments show that the algorithms are globally convergent to a non-equilibrium steady state of various biochemical networks, with only chemically consistent restrictions on the network topology.
△ Less
Submitted 27 September, 2016; v1 submitted 27 July, 2015;
originally announced July 2015.
-
A stability conjecture for the colored Jones polynomial
Authors:
Stavros Garoufalidis,
Thao Vuong
Abstract:
We formulate a stability conjecture for the coefficients of the colored Jones polynomial of a knot, colored by irreducible representations in a fixed ray of a simple Lie algebra, and verify it for all torus knots and all simple Lie algebras of rank $2$. Our conjecture is motivated by a structure theorem for the degree and the coefficients of a $q$-holonomic sequence of polynomials given in [Ga2] a…
▽ More
We formulate a stability conjecture for the coefficients of the colored Jones polynomial of a knot, colored by irreducible representations in a fixed ray of a simple Lie algebra, and verify it for all torus knots and all simple Lie algebras of rank $2$. Our conjecture is motivated by a structure theorem for the degree and the coefficients of a $q$-holonomic sequence of polynomials given in [Ga2] and by a stability theorem of the colored Jones polynomial of an alternating knot given in \cite{GL2}. We illustrate our results with sample computations.
△ Less
Submitted 26 October, 2013;
originally announced October 2013.
-
Flag algebras and the stable coefficients of the Jones polynomial
Authors:
Stavros Garoufalidis,
Sergey Norin,
Thao Vuong
Abstract:
We study the structure of the stable coefficients of the Jones polynomial of an alternating link. We start by identifying the first four stable coefficients with polynomial invariants of a (reduced) Tait graph of the link projection. This leads us to introduce a free polynomial algebra of invariants of graphs whose elements give invariants of alternating links which strictly refine the first four…
▽ More
We study the structure of the stable coefficients of the Jones polynomial of an alternating link. We start by identifying the first four stable coefficients with polynomial invariants of a (reduced) Tait graph of the link projection. This leads us to introduce a free polynomial algebra of invariants of graphs whose elements give invariants of alternating links which strictly refine the first four stable coefficients. We conjecture that all stable coefficients are elements of this algebra, and give experimental evidence for the fifth and sixth stable coefficient. We illustrate our results in tables of all alternating links with at most 10 crossings and all irreducible planar graphs with at most 6 vertices.
△ Less
Submitted 22 April, 2015; v1 submitted 23 September, 2013;
originally announced September 2013.
-
Alternating knots, planar graphs and q-series
Authors:
Stavros Garoufalidis,
Thao Vuong
Abstract:
Recent advances in Quantum Topology assign $q$-series to knots in at least three different ways. The $q$-series are given by generalized Nahm sums (i.e., special $q$-hypergeometric sums) and have unknown modular and asymptotic properties. We give an efficient method to compute those $q$-series that come from planar graphs (i.e., reduced Tait graphs of alternating links) and compute several terms o…
▽ More
Recent advances in Quantum Topology assign $q$-series to knots in at least three different ways. The $q$-series are given by generalized Nahm sums (i.e., special $q$-hypergeometric sums) and have unknown modular and asymptotic properties. We give an efficient method to compute those $q$-series that come from planar graphs (i.e., reduced Tait graphs of alternating links) and compute several terms of those series for all graphs with at most 8 edges drawing several conclusions. In addition, we give a graph-theory proof of a theorem of Dasbach-Lin which identifies the coefficient of $q^k$ in those series for $k=0,1,2$ in terms of polynomials on the number of vertices, edges and triangles of the graph. Updated tables of data.
△ Less
Submitted 13 December, 2013; v1 submitted 3 April, 2013;
originally announced April 2013.
-
The SL_3 colored Jones polynomial of the trefoil
Authors:
Stavros Garoufalidis,
Hugh Morton,
Thao Vuong
Abstract:
Rosso and Jones gave a formula for the colored Jones polynomial of a torus knot, colored by an irreducible representation of a simple Lie algebra. The Rosso-Jones formula involves a plethysm function, unknown in general. We provide an explicit formula for the second plethysm of an arbitrary representation of $\fsl_3$, which allows us to give an explicit formula for the colored Jones polynomial of…
▽ More
Rosso and Jones gave a formula for the colored Jones polynomial of a torus knot, colored by an irreducible representation of a simple Lie algebra. The Rosso-Jones formula involves a plethysm function, unknown in general. We provide an explicit formula for the second plethysm of an arbitrary representation of $\fsl_3$, which allows us to give an explicit formula for the colored Jones polynomial of the trefoil, and more generally, for T(2,n) torus knots. We give two independent proofs of our plethysm formula, one of which uses the work of Carini-Remmel. Our formula for the $\fsl_3$ colored Jones polynomial of T(2,n) torus knots allows us to verify the Degree Conjecture for those knots, to efficiently the $\fsl_3$ Witten-Reshetikhin-Turaev invariants of the Poincare sphere, and to guess a Groebner basis for recursion ideal of the $\fsl_3$ colored Jones polynomial of the trefoil.
△ Less
Submitted 3 October, 2011; v1 submitted 15 October, 2010;
originally announced October 2010.