Search | arXiv e-print repository

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Authors: Anh Duc Nguyen, Ilia Markov, Frank Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher

Abstract: Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneiti… ▽ More Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150\%$ speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: Accepted at the International Conference on Machine Learning (ICML 2025)

arXiv:2104.13818

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 1 May, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: This entry is redundant and was created in error. See arXiv:1908.06077 for the latest version

arXiv:1208.6271 [pdf, other]

Graph Symmetry Detection and Canonical Labeling: Differences and Synergies

Authors: Hadi Katebi, Karem A. Sakallah, Igor L. Markov

Abstract: Symmetries of combinatorial objects are known to complicate search algorithms, but such obstacles can often be removed by detecting symmetries early and discarding symmetric subproblems. Canonical labeling of combinatorial objects facilitates easy equivalence checking through quick matching. All existing canonical labeling software also finds symmetries, but the fastest symmetry-finding software d… ▽ More Symmetries of combinatorial objects are known to complicate search algorithms, but such obstacles can often be removed by detecting symmetries early and discarding symmetric subproblems. Canonical labeling of combinatorial objects facilitates easy equivalence checking through quick matching. All existing canonical labeling software also finds symmetries, but the fastest symmetry-finding software does not perform canonical labeling. In this work, we contrast the two problems and dissect typical algorithms to identify their similarities and differences. We then develop a novel approach to canonical labeling where symmetries are found first and then used to speed up the canonical labeling algorithms. Empirical results show that this approach outperforms state-of-the-art canonical labelers. △ Less

Submitted 30 August, 2012; originally announced August 2012.

Comments: 15 pages, 10 figures, 1 table, Turing-100

MSC Class: 68R10

Journal ref: H. Katebi, K. A. Sakallah and I. L. Markov, "Graph Symmetry Detection and Canonical Labeling: Differences and Synergies'' in Proc. Turing-100, EPIC vol. 10, pp. 181-195, Manchester, UK, 2012

arXiv:1208.6269 [pdf, other]

Conflict Anticipation in the Search for Graph Automorphisms

Authors: Hadi Katebi, Karem A. Sakallah, Igor L. Markov

Abstract: Effective search for graph automorphisms allows identifying symmetries in many discrete structures, ranging from chemical molecules to microprocessor circuits. Using this type of structure can enhance visualization as well as speed up computational optimization and verification. Competitive algorithms for the graph automorphism problem are based on efficient partition refinement augmented with gro… ▽ More Effective search for graph automorphisms allows identifying symmetries in many discrete structures, ranging from chemical molecules to microprocessor circuits. Using this type of structure can enhance visualization as well as speed up computational optimization and verification. Competitive algorithms for the graph automorphism problem are based on efficient partition refinement augmented with group-theoretic pruning techniques. In this paper, we improve prior algorithms for the graph automorphism problem by introducing simultaneous refinement of multiple partitions, which enables the anticipation of future conflicts in search and leads to significant pruning, reducing overall runtimes. Empirically, we observe an exponential speedup for the family of Miyazaki graphs, which have been shown to impede leading graph-automorphism algorithms. △ Less

Submitted 30 August, 2012; originally announced August 2012.

Comments: 15 pages, 9 Figures, 1 Table, Int'l Conf. on Logic for Programming, Artificial Intelligence and Reasoning (LPAR)

MSC Class: 68R10

Journal ref: H. Katebi, K. A. Sakallah and I. L. Markov, "Conflict Anticipation in the Search for Graph Automorphisms" in Proc. Int'l Conf. on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pp. 243-257, Merida, Venezuela, 2012

arXiv:0707.3622 [pdf, ps, other]

doi 10.1007/s00453-009-9312-5

Constant-degree graph expansions that preserve the treewidth

Authors: Igor Markov, Yaoyun Shi

Abstract: Many hard algorithmic problems dealing with graphs, circuits, formulas and constraints admit polynomial-time upper bounds if the underlying graph has small treewidth. The same problems often encourage reducing the maximal degree of vertices to simplify theoretical arguments or address practical concerns. Such degree reduction can be performed through a sequence of splittings of vertices, resulti… ▽ More Many hard algorithmic problems dealing with graphs, circuits, formulas and constraints admit polynomial-time upper bounds if the underlying graph has small treewidth. The same problems often encourage reducing the maximal degree of vertices to simplify theoretical arguments or address practical concerns. Such degree reduction can be performed through a sequence of splittings of vertices, resulting in an _expansion_ of the original graph. We observe that the treewidth of a graph may increase dramatically if the splittings are not performed carefully. In this context we address the following natural question: is it possible to reduce the maximum degree to a constant without substantially increasing the treewidth? Our work answers the above question affirmatively. We prove that any simple undirected graph G=(V, E) admits an expansion G'=(V', E') with the maximum degree <= 3 and treewidth(G') <= treewidth(G)+1. Furthermore, such an expansion will have no more than 2|E|+|V| vertices and 3|E| edges; it can be computed efficiently from a tree-decomposition of G. We also construct a family of examples for which the increase by 1 in treewidth cannot be avoided. △ Less

Submitted 24 July, 2007; originally announced July 2007.

Comments: 12 pages, 6 figures, the main result used by quant-ph/0511070

ACM Class: F.2.2; G.2.2

Journal ref: Algorithmica, Volume 59, Number 4, 461-470,2011

Showing 1–5 of 5 results for author: Markov, I