Search | arXiv e-print repository

Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural Networks

Authors: Roger Waleffe, Devesh Sarda, Jason Mohoney, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Theodoros Rekatsinas, Shivaram Venkataraman

Abstract: We study distributed training of Graph Neural Networks (GNNs) on billion-scale graphs that are partitioned across machines. Efficient training in this setting relies on min-edge-cut partitioning algorithms, which minimize cross-machine communication due to GNN neighborhood sampling. Yet, min-edge-cut partitioning over large graphs remains a challenge: State-of-the-art (SoTA) offline methods (e.g.,… ▽ More We study distributed training of Graph Neural Networks (GNNs) on billion-scale graphs that are partitioned across machines. Efficient training in this setting relies on min-edge-cut partitioning algorithms, which minimize cross-machine communication due to GNN neighborhood sampling. Yet, min-edge-cut partitioning over large graphs remains a challenge: State-of-the-art (SoTA) offline methods (e.g., METIS) are effective, but they require orders of magnitude more memory and runtime than GNN training itself, while computationally efficient algorithms (e.g., streaming greedy approaches) suffer from increased edge cuts. Thus, in this work we introduce Armada, a new end-to-end system for distributed GNN training whose key contribution is GREM, a novel min-edge-cut partitioning algorithm that can efficiently scale to large graphs. GREM builds on streaming greedy approaches with one key addition: prior vertex assignments are continuously refined during streaming, rather than frozen after an initial greedy selection. Our theoretical analysis and experimental results show that this refinement is critical to minimizing edge cuts and enables GREM to reach partition quality comparable to METIS but with 8-65x less memory and 8-46x faster. Given a partitioned graph, Armada leverages a new disaggregated architecture for distributed GNN training to further improve efficiency; we find that on common cloud machines, even with zero communication, GNN neighborhood sampling and feature loading bottleneck training. Disaggregation allows Armada to independently allocate resources for these operations and ensure that expensive GPUs remain saturated with computation. We evaluate Armada against SoTA systems for distributed GNN training and find that the disaggregated architecture leads to runtime improvements up to 4.5x and cost reductions up to 3.1x. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2401.16198 [pdf, other]

Contracting with a Learning Agent

Authors: Guru Guruganesh, Yoav Kolumbus, Jon Schneider, Inbal Talgam-Cohen, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Joshua R. Wang, S. Matthew Weinberg

Abstract: Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncer… ▽ More Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncertainty through learning. We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes. Optimizing against a no-regret agent is a known open problem in general games; we achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure. The solution has a surprisingly simple structure: for some $α> 0$, initially offer the agent a linear contract with scalar $α$, then switch to offering a linear contract with scalar $0$. This switch causes the agent to ``free-fall'' through their action space and during this time provides the principal with non-zero reward at zero cost. Despite apparent exploitation of the agent, this dynamic contract can leave \emph{both} players better off compared to the best static contract. Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically. Finally, we quantify the dependence of our results on knowledge of the time horizon, and are the first to address this consideration in the study of strategizing against learning agents. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.16609 [pdf, other]

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

Authors: Iosif Sakos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras

Abstract: A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence… ▽ More A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 32 pages, 18 figures

MSC Class: Primary 91A10; 91A26; secondary 68Q32

arXiv:2311.10859 [pdf, other]

doi 10.22331/q-2025-05-06-1737

A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games

Authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan

Abstract: Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed th… ▽ More Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/ε^2)$ iterations to $ε$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/ε)$ iterations to $ε$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $ε$-Nash equilibria in quantum zero-sum games. △ Less

Submitted 2 May, 2025; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 53 pages, 7 figures, QTML 2023 (Long Talk), Quantum Journal 2025

MSC Class: primary 91A05; 81Q93; secondary 68Q32; 91A26; 37N40;

Journal ref: Quantum 9, 1737 (2025)

arXiv:2306.16617 [pdf, ps, other]

Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds

Authors: Yang Cai, Michael I. Jordan, Tianyi Lin, Argyris Oikonomou, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monot… ▽ More Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold's curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.16502 [pdf, other]

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie

Abstract: For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their conve… ▽ More For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 37 pages, 6 main figures

arXiv:2306.01032 [pdf, other]

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras

Abstract: Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved… ▽ More Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 30 pages, 6 figures

arXiv:2305.15804 [pdf, ps, other]

Smoothed Complexity of SWAP in Local Graph Partitioning

Authors: Xi Chen, Chenghao Guo, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Mihalis Yannakakis

Abstract: We give the first quasipolynomial upper bound $φn^{\text{polylog}(n)}$ for the smoothed complexity of the SWAP algorithm for local Graph Partitioning (also known as Bisection Width), where $n$ is the number of nodes in the graph and $φ$ is a parameter that measures the magnitude of perturbations applied on its edge weights. More generally, we show that the same quasipolynomial upper bound holds fo… ▽ More We give the first quasipolynomial upper bound $φn^{\text{polylog}(n)}$ for the smoothed complexity of the SWAP algorithm for local Graph Partitioning (also known as Bisection Width), where $n$ is the number of nodes in the graph and $φ$ is a parameter that measures the magnitude of perturbations applied on its edge weights. More generally, we show that the same quasipolynomial upper bound holds for the smoothed complexity of the 2-FLIP algorithm for any binary Maximum Constraint Satisfaction Problem, including local Max-Cut, for which similar bounds were only known for $1$-FLIP. Our results are based on an analysis of cycles formed in long sequences of double flips, showing that it is unlikely for every move in a long sequence to incur a positive but small improvement in the cut weight. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 46 pages, 7 figures

arXiv:2301.02129 [pdf, ps, other]

Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games

Authors: Ioannis Anagnostides, Fivos Kalogiannis, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Stephen McAleer

Abstract: Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by… ▽ More Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by competing entities; the latter setting remains poorly understood despite its numerous applications. Since the seminal work of Von Stengel and Koller (GEB `97), different solution concepts have received attention from an algorithmic standpoint. Yet, the complexity of the standard Nash equilibrium has remained open. In this paper, we settle this question by showing that computing a Nash equilibrium in adversarial team games belongs to the class continuous local search (CLS), thereby establishing CLS-completeness by virtue of the recent CLS-hardness result of Rubinstein and Babichenko (STOC `21) in potential games. To do so, we leverage linear programming duality to prove that any $ε$-approximate stationary strategy for the team can be extended in polynomial time to an $O(ε)$-approximate Nash equilibrium, where the $O(\cdot)$ notation suppresses polynomial factors in the description of the game. As a consequence, we show that the Moreau envelop of a suitable best response function acts as a potential under certain natural gradient-based dynamics. △ Less

Submitted 30 May, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: To appear at the conference on Economics and Computation (EC) 2023

arXiv:2210.08857 [pdf, ps, other]

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Authors: Angeliki Giannou, Kyriakos Lotidis, Panayotis Mertikopoulos, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in speci… ▽ More Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of sufficiency conditions used in optimization. Our first result is that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $\mathcal{O}(1/\sqrt{n})$ distance-squared convergence rate if the method's step-size is chosen appropriately. Subsequently, specializing to the class of deterministic Nash policies, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 43 pages, 2 tables; to appear in the proceedings of NeurIPS 2022

MSC Class: Primary 91A15; 91A26; secondary 68Q32; 68T05; 90C40

arXiv:2208.02204 [pdf, ps, other]

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Authors: Fivos Kalogiannis, Ioannis Anagnostides, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Vaggos Chatziafratis, Stelios Stavroulakis

Abstract: Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those pri… ▽ More Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $ε$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/ε$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97). △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2207.07557 [pdf, other]

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Authors: Christos H. Papadimitriou, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Manolis Zampetakis

Abstract: Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of… ▽ More Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of our theorem we are able to characterize the computational complexity of the following fundamental problems: (1) Concave Games. Introduced by the celebrated works of Debreu and Rosen in the 1950s and 60s, concave $n$-person games have found many important applications in Economics and Game Theory. We characterize the computational complexity of finding an equilibrium in such games. We show that a general formulation of this problem belongs to PPAD, and that finding an equilibrium is PPAD-hard even for a rather restricted games of this kind: strongly-concave utilities that can be expressed as multivariate polynomials of a constant degree with axis aligned box constraints. (2) Walrasian Equilibrium. Using Kakutani's fixed point Arrow and Debreu we resolve an open problem related to Walras's theorem on the existence of price equilibria in general economies. There are many results about the PPAD-hardness of Walrasian equilibria, but the inclusion in PPAD is only known for piecewise linear utilities. We show that the problem with general convex utilities is in PPAD. Along the way we provide a Lipschitz continuous version of Berge's maximum theorem that may be of independent interest. △ Less

Submitted 25 May, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.02041 [pdf, other]

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Authors: Michael I. Jordan, Tianyi Lin, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave R… ▽ More From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold. △ Less

Submitted 28 September, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: 39 pages, 12 figures, under submission

arXiv:2202.05096 [pdf, ps, other]

Near-Optimal Statistical Query Lower Bounds for Agnostically Learning Intersections of Halfspaces with Gaussian Marginals

Authors: Daniel Hsu, Clayton Sanford, Rocco Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: We consider the well-studied problem of learning intersections of halfspaces under the Gaussian distribution in the challenging \emph{agnostic learning} model. Recent work of Diakonikolas et al. (2021) shows that any Statistical Query (SQ) algorithm for agnostically learning the class of intersections of $k$ halfspaces over $\mathbb{R}^n$ to constant excess error either must make queries of tolera… ▽ More We consider the well-studied problem of learning intersections of halfspaces under the Gaussian distribution in the challenging \emph{agnostic learning} model. Recent work of Diakonikolas et al. (2021) shows that any Statistical Query (SQ) algorithm for agnostically learning the class of intersections of $k$ halfspaces over $\mathbb{R}^n$ to constant excess error either must make queries of tolerance at most $n^{-\tildeΩ(\sqrt{\log k})}$ or must make $2^{n^{Ω(1)}}$ queries. We strengthen this result by improving the tolerance requirement to $n^{-\tildeΩ(\log k)}$. This lower bound is essentially best possible since an SQ algorithm of Klivans et al. (2008) agnostically learns this class to any constant excess error using $n^{O(\log k)}$ queries of tolerance $n^{-O(\log k)}$. We prove two variants of our lower bound, each of which combines ingredients from Diakonikolas et al. (2021) with (an extension of) a different earlier approach for agnostic SQ lower bounds for the Boolean setting due to Dachman-Soled et al. (2014). Our approach also yields lower bounds for agnostically SQ learning the class of "convex subspace juntas" (studied by Vempala, 2010) and the class of sets with bounded Gaussian surface area; all of these lower bounds are nearly optimal since they essentially match known upper bounds from Klivans et al. (2008). △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: 19 pages

arXiv:2111.04178 [pdf, other]

Towards convergence to Nash equilibria in two-team zero-sum games

Authors: Fivos Kalogiannis, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identi… ▽ More Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identical to that of their teammates and opposite to that of the opposing team. We focus on the solution concept of Nash equilibria (NE). We first show that computing NE for this class of games is $\textit{hard}$ for the complexity class ${\mathrm{CLS}}$. To further examine the capabilities of online learning algorithms in games with full-information feedback, we propose a benchmark of a simple -- yet nontrivial -- family of such games. These games do not enjoy the properties used to prove convergence for relevant algorithms. In particular, we use a dynamical systems perspective to demonstrate that gradient descent-ascent, its optimistic variant, optimistic multiplicative weights update, and extra gradient fail to converge (even locally) to a Nash equilibrium. On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys last-iterate local convergence to a Nash equilibrium. We also believe our proposed method is of independent interest for general min-max optimization. △ Less

Submitted 16 April, 2023; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: Paper accepted in ICLR 2023

arXiv:2102.02336 [pdf, other]

On the Approximation Power of Two-Layer Networks of Random ReLUs

Authors: Daniel Hsu, Clayton Sanford, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ… ▽ More This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ tools from harmonic analysis and ridgelet representation theory, while our lower-bounds are based on (robust versions of) dimensionality arguments. △ Less

Submitted 7 September, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: 39 pages, COLT version

Journal ref: Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134 (2021) 2423-2461

arXiv:2101.05248 [pdf, other]

Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent

Authors: Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Georgios Piliouras

Abstract: Many recent AI architectures are inspired by zero-sum games, however, the behavior of their dynamics is still not well understood. Inspired by this, we study standard gradient descent ascent (GDA) dynamics in a specific class of non-convex non-concave zero-sum games, that we call hidden zero-sum games. In this class, players control the inputs of smooth but possibly non-linear functions whose outp… ▽ More Many recent AI architectures are inspired by zero-sum games, however, the behavior of their dynamics is still not well understood. Inspired by this, we study standard gradient descent ascent (GDA) dynamics in a specific class of non-convex non-concave zero-sum games, that we call hidden zero-sum games. In this class, players control the inputs of smooth but possibly non-linear functions whose outputs are being applied as inputs to a convex-concave game. Unlike general zero-sum games, these games have a well-defined notion of solution; outcomes that implement the von-Neumann equilibrium of the "hidden" convex-concave game. We prove that if the hidden game is strictly convex-concave then vanilla GDA converges not merely to local Nash, but typically to the von-Neumann solution. If the game lacks strict convexity properties, GDA may fail to converge to any equilibrium, however, by applying standard regularization techniques we can prove convergence to a von-Neumann solution of a slightly perturbed zero-sum game. Our convergence guarantees are non-local, which as far as we know is a first-of-its-kind type of result in non-convex non-concave games. Finally, we discuss connections of our framework with generative adversarial networks. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2101.04667 [pdf, ps, other]

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

Authors: Angeliki Giannou, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos

Abstract: In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context… ▽ More In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the folk theorem of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games △ Less

Submitted 4 February, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

arXiv:2011.06202 [pdf, ps, other]

Optimal Private Median Estimation under Minimal Distributional Assumptions

Authors: Christos Tzamos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ilias Zadik

Abstract: We study the fundamental task of estimating the median of an underlying distribution from a finite number of samples, under pure differential privacy constraints. We focus on distributions satisfying the minimal assumption that they have a positive density at a small neighborhood around the median. In particular, the distribution is allowed to output unbounded values and is not required to have fi… ▽ More We study the fundamental task of estimating the median of an underlying distribution from a finite number of samples, under pure differential privacy constraints. We focus on distributions satisfying the minimal assumption that they have a positive density at a small neighborhood around the median. In particular, the distribution is allowed to output unbounded values and is not required to have finite moments. We compute the exact, up-to-constant terms, statistical rate of estimation for the median by providing nearly-tight upper and lower bounds. Furthermore, we design a polynomial-time differentially private algorithm which provably achieves the optimal performance. At a technical level, our results leverage a Lipschitz Extension Lemma which allows us to design and analyze differentially private algorithms solely on appropriately defined "typical" instances of the samples. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: 49 pages, NeurIPS 2020, Spotlight talk

MSC Class: Primary 68P27; secondary 68Q32

arXiv:2010.09514 [pdf, ps, other]

No-regret learning and mixed Nash equilibria: They do not mix

Authors: Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Thanasis Lianeas, Panayotis Mertikopoulos, Georgios Piliouras

Abstract: Understanding the behavior of no-regret dynamics in general $N$-player games is a fundamental question in online learning and game theory. A folk result in the field states that, in finite games, the empirical frequency of play under no-regret learning converges to the game's set of coarse correlated equilibria. By contrast, our understanding of how the day-to-day behavior of the dynamics correlat… ▽ More Understanding the behavior of no-regret dynamics in general $N$-player games is a fundamental question in online learning and game theory. A folk result in the field states that, in finite games, the empirical frequency of play under no-regret learning converges to the game's set of coarse correlated equilibria. By contrast, our understanding of how the day-to-day behavior of the dynamics correlates to the game's Nash equilibria is much more limited, and only partial results are known for certain classes of games (such as zero-sum or congestion games). In this paper, we study the dynamics of "follow-the-regularized-leader" (FTRL), arguably the most well-studied class of no-regret dynamics, and we establish a sweeping negative result showing that the notion of mixed Nash equilibrium is antithetical to no-regret learning. Specifically, we show that any Nash equilibrium which is not strict (in that every player has a unique best response) cannot be stable and attracting under the dynamics of FTRL. This result has significant implications for predicting the outcome of a learning process as it shows unequivocally that only strict (and hence, pure) Nash equilibria can emerge as stable limit points thereof. △ Less

Submitted 20 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 24 pages, 7 figures, 1 table

MSC Class: Primary 91A26; 37N40; secondary 91A68; 68Q32; 68T05

arXiv:2007.09599 [pdf, ps, other]

Reconstructing weighted voting schemes from partial information about their power indices

Authors: Huck Bennett, Anindya De, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

Abstract: A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In… ▽ More A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over $\{-1, 1\}^n$ given some numerical measure (such as the function's $n$ ``Chow parameters,'' a.k.a. its degree-1 Fourier coefficients, or the vector of its $n$ Shapley indices) of how much each of the $n$ individual input variables affects the outcome of the function. In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset $S \subseteq [n]$ of the $n$ input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in $S$ accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem. Our main results are a polynomial time algorithm for the ($\varepsilon$-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the ($\varepsilon$-approximate) Shapley Indices Partial Inverse Power Index Problem. △ Less

Submitted 26 July, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

arXiv:1911.10381 [pdf, ps, other]

Smoothed complexity of local Max-Cut and binary Max-CSP

Authors: Xi Chen, Chenghao Guo, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Mihalis Yannakakis, Xinzhi Zhang

Abstract: We show that the smoothed complexity of the FLIP algorithm for local Max-Cut is at most $\smash{φn^{O(\sqrt{\log n})}}$, where $n$ is the number of nodes in the graph and $φ$ is a parameter that measures the magnitude of perturbations applied on its edge weights. This improves the previously best upper bound of $φn^{O(\log n)}$ by Etscheid and Röglin. Our result is based on an analysis of long seq… ▽ More We show that the smoothed complexity of the FLIP algorithm for local Max-Cut is at most $\smash{φn^{O(\sqrt{\log n})}}$, where $n$ is the number of nodes in the graph and $φ$ is a parameter that measures the magnitude of perturbations applied on its edge weights. This improves the previously best upper bound of $φn^{O(\log n)}$ by Etscheid and Röglin. Our result is based on an analysis of long sequences of flips, which shows~that~it is very unlikely for every flip in a long sequence to incur a positive but small improvement in the cut weight. We also extend the same upper bound on the smoothed complexity of FLIP to all binary Maximum Constraint Satisfaction Problems. △ Less

Submitted 23 November, 2019; originally announced November 2019.

arXiv:1910.13021 [pdf, other]

Efficiently avoiding saddle points with zero order methods: No gradients required

Authors: Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Georgios Piliouras

Abstract: We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients. For a wide variety of gradient approximators based on finite differences, we establish asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem. Regarding… ▽ More We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients. For a wide variety of gradient approximators based on finite differences, we establish asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem. Regarding efficiency, we introduce a noisy zero-order method that converges to second order stationary points, i.e avoids saddle points. Our algorithm uses only $\tilde{\mathcal{O}}(1 / ε^2)$ approximate gradient calculations and, thus, it matches the converge rate guarantees of their exact gradient counterparts up to constants. In contrast to previous work, our convergence rate analysis avoids imposing additional dimension dependent slowdowns in the number of iterations required for non-convex zero order optimization. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: To appear in NeurIPS 2019

arXiv:1910.13010 [pdf, other]

Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

Authors: Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Georgios Piliouras

Abstract: We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games. In this class, players control the inputs of a smooth function whose output is being applied to a bilinear zero-sum game. This class of games is motivated by the indirect nature of the competition in Generative Adversarial Networks, where players control the parameters of a neural… ▽ More We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games. In this class, players control the inputs of a smooth function whose output is being applied to a bilinear zero-sum game. This class of games is motivated by the indirect nature of the competition in Generative Adversarial Networks, where players control the parameters of a neural network while the actual competition happens between the distributions that the generator and discriminator capture. We establish theoretically, that depending on the specific instance of the problem gradient-descent-ascent dynamics can exhibit a variety of behaviors antithetical to convergence to the game theoretically meaningful min-max solution. Specifically, different forms of recurrent behavior (including periodicity and Poincaré recurrence) are possible as well as convergence to spurious (non-min-max) equilibria for a positive measure of initial conditions. At the technical level, our analysis combines tools from optimization theory, game theory and dynamical systems. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: To appear in NeurIPS 2019 (Spotlight talk)

arXiv:1806.00416 [pdf, other]

Pattern Search Multidimensional Scaling

Authors: Georgios Paraskevopoulos, Efthymios Tzinis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Alexandros Potamianos

Abstract: We present a novel view of nonlinear manifold learning using derivative-free optimization techniques. Specifically, we propose an extension of the classical multi-dimensional scaling (MDS) method, where instead of performing gradient descent, we sample and evaluate possible "moves" in a sphere of fixed radius for each point in the embedded space. A fixed-point convergence guarantee can be shown by… ▽ More We present a novel view of nonlinear manifold learning using derivative-free optimization techniques. Specifically, we propose an extension of the classical multi-dimensional scaling (MDS) method, where instead of performing gradient descent, we sample and evaluate possible "moves" in a sphere of fixed radius for each point in the embedded space. A fixed-point convergence guarantee can be shown by formulating the proposed algorithm as an instance of General Pattern Search (GPS) framework. Evaluation on both clean and noisy synthetic datasets shows that pattern search MDS can accurately infer the intrinsic geometry of manifolds embedded in high-dimensional spaces. Additionally, experiments on real data, even under noisy conditions, demonstrate that the proposed pattern search MDS yields state-of-the-art results. △ Less

Submitted 30 October, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: 36 pages, Under review for JMLR

Showing 1–25 of 25 results for author: Vlatakis-Gkaragkounis, E