-
Bounded Memory in Distributed Networks
Authors:
Ran Ben Basat,
Keren Censor-Hillel,
Yi-Jun Chang,
Wenchen Han,
Dean Leitersdorf,
Gregory Schwartzman
Abstract:
The recent advent of programmable switches makes distributed algorithms readily deployable in real-world datacenter networks. However, there are still gaps between theory and practice that prevent the smooth adaptation of CONGEST algorithms to these environments. In this paper, we focus on the memory restrictions that arise in real-world deployments. We introduce the $μ$-CONGEST model where on top…
▽ More
The recent advent of programmable switches makes distributed algorithms readily deployable in real-world datacenter networks. However, there are still gaps between theory and practice that prevent the smooth adaptation of CONGEST algorithms to these environments. In this paper, we focus on the memory restrictions that arise in real-world deployments. We introduce the $μ$-CONGEST model where on top of the bandwidth restriction, the memory of nodes is also limited to $μ$ words, in line with real-world systems. We provide fast algorithms of two main flavors.
First, we observe that many algorithms in the CONGEST model are memory-intensive and do not work in $μ$-CONGEST. A prime example of a family of algorithms that use large memory is clique-listing algorithms. We show that the memory issue that arises here cannot be resolved without incurring a cost in the round complexity, by establishing a lower bound on the round complexity of listing cliques in $μ$-CONGEST. We introduce novel techniques to overcome these issues and generalize the algorithms to work within a given memory bound. Combined with our lower bound, these provide tight tradeoffs between the running time and memory of nodes.
Second, we show that it is possible to efficiently simulate various families of streaming algorithms in $μ$-CONGEST. These include fast simulations of $p$-pass algorithms, random order streams, and various types of mergeable streaming algorithms.
Combining our contributions, we show that we can use streaming algorithms to efficiently generate statistics regarding combinatorial structures in the network. An example of an end result of this type is that we can efficiently identify and provide the per-color frequencies of the frequent monochromatic triangles in $μ$-CONGEST.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Coreset Spectral Clustering
Authors:
Ben Jourdan,
Gregory Schwartzman,
Peter Macgregor,
He Sun
Abstract:
Coresets have become an invaluable tool for solving $k$-means and kernel $k$-means clustering problems on large datasets with small numbers of clusters. On the other hand, spectral clustering works well on sparse graphs and has recently been extended to scale efficiently to large numbers of clusters. We exploit the connection between kernel $k$-means and the normalised cut problem to combine the b…
▽ More
Coresets have become an invaluable tool for solving $k$-means and kernel $k$-means clustering problems on large datasets with small numbers of clusters. On the other hand, spectral clustering works well on sparse graphs and has recently been extended to scale efficiently to large numbers of clusters. We exploit the connection between kernel $k$-means and the normalised cut problem to combine the benefits of both. Our main result is a coreset spectral clustering algorithm for graphs that clusters a coreset graph to infer a good labelling of the original graph. We prove that an $α$-approximation for the normalised cut problem on the coreset graph is an $O(α)$-approximation on the original. We also improve the running time of the state-of-the-art coreset algorithm for kernel $k$-means on sparse kernels, from $\tilde{O}(nk)$ to $\tilde{O}(n\cdot \min \{k, d_{avg}\})$, where $d_{avg}$ is the average number of non-zero entries in each row of the $n\times n$ kernel matrix. Our experiments confirm our coreset algorithm is asymptotically faster on large real-world graphs with many clusters, and show that our clustering algorithm overcomes the main challenge faced by coreset kernel $k$-means on sparse kernels which is getting stuck in local optima.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Mini-Batch Kernel $k$-means
Authors:
Ben Jourdan,
Gregory Schwartzman
Abstract:
We present the first mini-batch kernel $k$-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $\widetilde{O}(kb^2)$ time, significantly faster than the $O(n^2)$ time required by the full batch kernel $k$-means, where $n$ is the dataset size and $b$ is the batch size. Extensive experiments demon…
▽ More
We present the first mini-batch kernel $k$-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $\widetilde{O}(kb^2)$ time, significantly faster than the $O(n^2)$ time required by the full batch kernel $k$-means, where $n$ is the dataset size and $b$ is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel $k$-means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of $\widetildeΩ(\max \{γ^{4}, γ^{2}\} \cdot ε^{-2})$, the algorithm terminates in $O(γ^2/ε)$ iterations with high probability, where $γ$ bounds the norm of points in feature space and $ε$ is a termination threshold. Our analysis holds for any reasonable center initialization, and when using $k$-means++ initialization, the algorithm achieves an approximation ratio of $O(\log k)$ in expectation. For normalized kernels, such as Gaussian or Laplacian it holds that $γ=1$. Taking $ε= O(1)$ and $b=Θ(\log n)$, the algorithm terminates in $O(1)$ iterations, with each iteration running in $\widetilde{O}(k)$ time.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Stochastic Distance in Property Testing
Authors:
Uri Meir,
Gregory Schwartzman,
Yuichi Yoshida
Abstract:
We introduce a novel concept termed "stochastic distance" for property testing. Diverging from the traditional definition of distance, where a distance $t$ implies that there exist $t$ edges that can be added to ensure a graph possesses a certain property (such as $k$-edge-connectivity), our new notion implies that there is a high probability that adding $t$ random edges will endow the graph with…
▽ More
We introduce a novel concept termed "stochastic distance" for property testing. Diverging from the traditional definition of distance, where a distance $t$ implies that there exist $t$ edges that can be added to ensure a graph possesses a certain property (such as $k$-edge-connectivity), our new notion implies that there is a high probability that adding $t$ random edges will endow the graph with the desired property. While formulating testers based on this new distance proves challenging in a sequential environment, it is much easier in a distributed setting. Taking $k$-edge-connectivity as a case study, we design ultra-fast testing algorithms in the CONGEST model. Our introduction of stochastic distance offers a more natural fit for the distributed setting, providing a promising avenue for future research in emerging models of computation.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Exfiltration of personal information from ChatGPT via prompt injection
Authors:
Gregory Schwartzman
Abstract:
We report that ChatGPT 4 and 4o are susceptible to a prompt injection attack that allows an attacker to exfiltrate users' personal data. It is applicable without the use of any 3rd party tools and all users are currently affected. This vulnerability is exacerbated by the recent introduction of ChatGPT's memory feature, which allows an attacker to command ChatGPT to monitor the user for the desired…
▽ More
We report that ChatGPT 4 and 4o are susceptible to a prompt injection attack that allows an attacker to exfiltrate users' personal data. It is applicable without the use of any 3rd party tools and all users are currently affected. This vulnerability is exacerbated by the recent introduction of ChatGPT's memory feature, which allows an attacker to command ChatGPT to monitor the user for the desired personal data.
△ Less
Submitted 6 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
Mini-batch Submodular Maximization
Authors:
Gregory Schwartzman
Abstract:
We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We consider two sampling approaches: uniform and weighted. We first show that mini-batch with weighted sampling improves over the state of the art sparsifier based approach both in theory and in practice.
Surprisingly, our experiment…
▽ More
We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We consider two sampling approaches: uniform and weighted. We first show that mini-batch with weighted sampling improves over the state of the art sparsifier based approach both in theory and in practice.
Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling. However, it is impossible to explain this using worst-case analysis. Our main contribution is using smoothed analysis to provide a theoretical foundation for our experimental results. We show that, under very mild assumptions, uniform sampling is superior for both the mini-batch and the sparsifier approaches. We empirically verify that these assumptions hold for our datasets. Uniform sampling is simple to implement and has complexity independent of $N$, making it the perfect candidate to tackle massive real-world datasets.
△ Less
Submitted 2 October, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Local Max-Cut on Sparse Graphs
Authors:
Gregory Schwartzman
Abstract:
We bound the smoothed running time of the FLIP algorithm for local Max-Cut as a function of $α$, the arboricity of the input graph. We show that, with high probability and in expectation, the following holds (where $n$ is the number of nodes and $φ$ is the smoothing parameter):
1) When $α= O(\log^{1-δ} n)$ FLIP terminates in $φpoly(n)$ iterations, where $δ\in (0,1]$ is an arbitrarily small const…
▽ More
We bound the smoothed running time of the FLIP algorithm for local Max-Cut as a function of $α$, the arboricity of the input graph. We show that, with high probability and in expectation, the following holds (where $n$ is the number of nodes and $φ$ is the smoothing parameter):
1) When $α= O(\log^{1-δ} n)$ FLIP terminates in $φpoly(n)$ iterations, where $δ\in (0,1]$ is an arbitrarily small constant. Previous to our results the only graph families for which FLIP was known to achieve a smoothed polynomial running time were complete graphs and graphs with logarithmic maximum degree.
2) For arbitrary values of $α$ we get a running time of $φn^{O(\fracα{\log n} + \log α)}$. This improves over the best known running time for general graphs of $φn^{O(\sqrt{ \log n })}$ for $α= o(\log^{1.5} n)$. Specifically, when $α= O(\log n)$ we get a significantly faster running time of $φn^{O(\log \log n)}$.
△ Less
Submitted 21 April, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Mini-batch $k$-means terminates within $O(d/ε)$ iterations
Authors:
Gregory Schwartzman
Abstract:
We answer the question: "Does local progress (on batches) imply global progress (on the entire dataset) for mini-batch $k$-means?". Specifically, we consider mini-batch $k$-means which terminates only when the improvement in the quality of the clustering on the sampled batch is below some threshold.
Although at first glance it appears that this algorithm might execute forever, we answer the abov…
▽ More
We answer the question: "Does local progress (on batches) imply global progress (on the entire dataset) for mini-batch $k$-means?". Specifically, we consider mini-batch $k$-means which terminates only when the improvement in the quality of the clustering on the sampled batch is below some threshold.
Although at first glance it appears that this algorithm might execute forever, we answer the above question in the affirmative and show that if the batch is of size $\tildeΩ((d/ε)^2)$, it must terminate within $O(d/ε)$ iterations with high probability, where $d$ is the dimension of the input, and $ε$ is a threshold parameter for termination. This is true regardless of how the centers are initialized. When the algorithm is initialized with the $k$-means++ initialization scheme, it achieves an approximation ratio of $O(\log k)$ (the same as the full-batch version).
Finally, we show the applicability of our results to the mini-batch $k$-means algorithm implemented in the scikit-learn (sklearn) python library.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Fully Polynomial-Time Distributed Computation in Low-Treewidth Graphs
Authors:
Taisuke Izumi,
Naoki Kitamura,
Takamasa Naruse,
Gregory Schwartzman
Abstract:
We consider global problems, i.e. problems that take at least diameter time, even when the bandwidth is not restricted. We show that all problems considered admit efficient solutions in low-treewidth graphs. By ``efficient'' we mean that the running time has polynomial dependence on the treewidth, a linear dependence on the diameter (which is unavoidable), and only a polylogarithmic dependence on…
▽ More
We consider global problems, i.e. problems that take at least diameter time, even when the bandwidth is not restricted. We show that all problems considered admit efficient solutions in low-treewidth graphs. By ``efficient'' we mean that the running time has polynomial dependence on the treewidth, a linear dependence on the diameter (which is unavoidable), and only a polylogarithmic dependence on $n$, the number of nodes in the graph. We present the algorithms solving the following problems in the CONGEST model which all attain $\tilde{O(τ^{O(1)}D)}$-round complexity (where $τ$ and $D$ denote the treewidth and diameter of the graph, respectively): (1) Exact single-source shortest paths (actually, the more general problem of computing a distance labeling scheme) for weighted and directed graphs, (2) exact bipartite unweighted maximum matching, and (3) the weighted girth for both directed and undirected graphs. We derive all of our results using a single unified framework, which consists of two novel technical ingredients, The first is a fully polynomial-time distributed tree decomposition algorithm, which outputs a decomposition of width $O(τ^2\log n)$ in $\tilde{O}(τ^{O(1)}D)$ rounds (where $n$ is the number of nodes in the graph). The second ingredient, and the technical highlight of this paper, is the novel concept of a \emph{stateful walk constraint}, which naturally defines a set of feasible walks in the input graph based on their local properties (e.g., augmenting paths). Given a stateful walk constraint, the constrained version of the shortest paths problem (or distance labeling) requires the algorithm to output the shortest \emph{constrained} walk (or its distance) for a given source and sink vertices. We show that this problem can be efficiently solved in the CONGEST model by reducing it to an \emph{unconstrained} version of the problem.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
FRANCIS: Fast Reaction Algorithms for Network Coordination In Switches
Authors:
Wenchen Han,
Vic Feng,
Gregory Schwartzman,
Yuliang Li,
Michael Mitzenmacher,
Minlan Yu,
Ran Ben-Basat
Abstract:
Optimizing the reaction to network events, which is critical in tasks such as clock synchronization, multicast, and routing, becomes increasingly challenging as networks grow larger. To improve the reaction time compared to centralized solutions, the theory community has made significant progress in the design of message-passing algorithms that leverage all nodes for distributed computation, and t…
▽ More
Optimizing the reaction to network events, which is critical in tasks such as clock synchronization, multicast, and routing, becomes increasingly challenging as networks grow larger. To improve the reaction time compared to centralized solutions, the theory community has made significant progress in the design of message-passing algorithms that leverage all nodes for distributed computation, and the advent of programmable switches makes it now possible to materialize them.
We propose FRANCIS, a framework and associated libraries for running message-passing algorithms on programmable switches. It features primitives that allow easy integration of such algorithms for quickly reacting to network events while optimizing resource consumption. We use FRANCIS to implement event reaction solutions that improve clock synchronization, source-routed multicast, and routing and demonstrate up to 18x reduction in reaction time.
△ Less
Submitted 1 November, 2024; v1 submitted 29 April, 2022;
originally announced April 2022.
-
SGD Through the Lens of Kolmogorov Complexity
Authors:
Gregory Schwartzman
Abstract:
We prove that stochastic gradient descent (SGD) finds a solution that achieves $(1-ε)$ classification accuracy on the entire dataset. We do so under two main assumptions: (1. Local progress) The model accuracy improves on average over batches. (2. Models compute simple functions) The function computed by the model is simple (has low Kolmogorov complexity). It is sufficient that these assumptions h…
▽ More
We prove that stochastic gradient descent (SGD) finds a solution that achieves $(1-ε)$ classification accuracy on the entire dataset. We do so under two main assumptions: (1. Local progress) The model accuracy improves on average over batches. (2. Models compute simple functions) The function computed by the model is simple (has low Kolmogorov complexity). It is sufficient that these assumptions hold only for a tiny fraction of the epochs.
Intuitively, the above means that intermittent local progress of SGD implies global progress. Assumption 2 trivially holds for underparameterized models, hence, our work gives the first convergence guarantee for general, underparameterized models. Furthermore, this is the first result which is completely model agnostic - we do not require the model to have any specific architecture or activation function, it may not even be a neural network. Our analysis makes use of the entropy compression method, which was first introduced by Moser and Tardos in the context of the Lovász local lemma.
△ Less
Submitted 14 May, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
On the Complexity of Load Balancing in Dynamic Networks
Authors:
Seth Gilbert,
Uri Meir,
Ami Paz,
Gregory Schwartzman
Abstract:
In the load balancing problem, each node in a network is assigned a load, and the goal is to equally distribute the loads among the nodes, by preforming local load exchanges. While load balancing was extensively studied in static networks, only recently a load balancing algorithm for dynamic networks with a bounded convergence time was presented. In this paper, we further study the time complexity…
▽ More
In the load balancing problem, each node in a network is assigned a load, and the goal is to equally distribute the loads among the nodes, by preforming local load exchanges. While load balancing was extensively studied in static networks, only recently a load balancing algorithm for dynamic networks with a bounded convergence time was presented. In this paper, we further study the time complexity of load balancing in the context of dynamic networks.
First, we show that randomness is not necessary, and present a deterministic algorithm which slightly improves the running time of the previous algorithm, at the price of not being matching-based. Then, we consider integral loads, i.e., loads that cannot be split indefinitely, and prove that no matching-based algorithm can have a bounded convergence time for this case.
To circumvent both this impossibility result, and a known one for the non-integral case, we apply the method of smoothed analysis, where random perturbations are made over the worst-case choices of network topologies. We show both impossibility results do not hold under this kind of analysis, suggesting that load-balancing in real world systems might be faster than the lower bounds suggest.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Smoothed Analysis of Population Protocols
Authors:
Gregory Schwartzman,
Yuichi Sudo
Abstract:
In this work, we initiate the study of \emph{smoothed analysis} of population protocols. We consider a population protocol model where an adaptive adversary dictates the interactions between agents, but with probability $p$ every such interaction may change into an interaction between two agents chosen uniformly at random. That is, $p$-fraction of the interactions are random, while $(1-p)$-fractio…
▽ More
In this work, we initiate the study of \emph{smoothed analysis} of population protocols. We consider a population protocol model where an adaptive adversary dictates the interactions between agents, but with probability $p$ every such interaction may change into an interaction between two agents chosen uniformly at random. That is, $p$-fraction of the interactions are random, while $(1-p)$-fraction are adversarial. The aim of our model is to bridge the gap between a uniformly random scheduler (which is too idealistic) and an adversarial scheduler (which is too strict).
We focus on the fundamental problem of leader election in population protocols. We show that, for a population of size $n$, the leader election problem can be solved in $O(p^{-2}n \log^3 n)$ steps with high probability, using $O((\log^2 n) \cdot (\log (n/p)))$ states per agent, for \emph{all} values of $p\leq 1$. Although our result does not match the best known running time of $O(n \log n)$ for the uniformly random scheduler ($p=1$), we are able to present a \emph{smooth transition} between a running time of $O(n \cdot \mathrm{polylog} n)$ for $p=1$ and an infinite running time for the adversarial scheduler ($p=0$), where the problem cannot be solved. The key technical contribution of our work is a novel \emph{phase clock} algorithm for our model. This is a key primitive for much-studied fundamental population protocol algorithms (leader election, majority), and we believe it is of independent interest.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Models of Smoothing in Dynamic Networks
Authors:
Uri Meir,
Ami Paz,
Gregory Schwartzman
Abstract:
Smoothed analysis is a framework suggested for mediating gaps between worst-case and average-case complexities. In a recent work, Dinitz et al.~[Distributed Computing, 2018] suggested to use smoothed analysis in order to study dynamic networks. Their aim was to explain the gaps between real-world networks that function well despite being dynamic, and the strong theoretical lower bounds for arbitra…
▽ More
Smoothed analysis is a framework suggested for mediating gaps between worst-case and average-case complexities. In a recent work, Dinitz et al.~[Distributed Computing, 2018] suggested to use smoothed analysis in order to study dynamic networks. Their aim was to explain the gaps between real-world networks that function well despite being dynamic, and the strong theoretical lower bounds for arbitrary networks. To this end, they introduced a basic model of smoothing in dynamic networks, where an adversary picks a sequence of graphs, representing the topology of the network over time, and then each of these graphs is slightly perturbed in a random manner. The model suggested above is based on a per-round noise, and our aim in this work is to extend it to models of noise more suited for multiple rounds. This is motivated by long-lived networks, where the amount and location of noise may vary over time. To this end, we present several different models of noise. First, we extend the previous model to cases where the amount of noise is very small. Then, we move to more refined models, where the amount of noise can change between different rounds, e.g., as a function of the number of changes the network undergoes. We also study a model where the noise is not arbitrarily spread among the network, but focuses in each round in the areas where changes have occurred. Finally, we study the power of an adaptive adversary, who can choose its actions in accordance with the changes that have occurred so far. We use the flooding problem as a running case-study, presenting very different behaviors under the different models of noise, and analyze the flooding time in different models.
△ Less
Submitted 27 September, 2020;
originally announced September 2020.
-
Finding Subgraphs in Highly Dynamic Networks
Authors:
Keren Censor-Hillel,
Victor I. Kolobov,
Gregory Schwartzman
Abstract:
In this paper we consider the fundamental problem of finding subgraphs in highly dynamic distributed networks - networks which allow an arbitrary number of links to be inserted / deleted per round. We show that the problems of $k$-clique membership listing (for any $k\geq 3$), 4-cycle listing and 5-cycle listing can be deterministically solved in $O(1)$-amortized round complexity, even with limite…
▽ More
In this paper we consider the fundamental problem of finding subgraphs in highly dynamic distributed networks - networks which allow an arbitrary number of links to be inserted / deleted per round. We show that the problems of $k$-clique membership listing (for any $k\geq 3$), 4-cycle listing and 5-cycle listing can be deterministically solved in $O(1)$-amortized round complexity, even with limited logarithmic-sized messages.
To achieve $k$-clique membership listing we introduce a very useful combinatorial structure which we name the robust $2$-hop neighborhood. This is a subset of the 2-hop neighborhood of a node, and we prove that it can be maintained in highly dynamic networks in $O(1)$-amortized rounds. We also show that maintaining the actual 2-hop neighborhood of a node requires near linear amortized time, showing the necessity of our definition. For $4$-cycle and $5$-cycle listing, we need edges within hop distance 3, for which we similarly define the robust $3$-hop neighborhood and prove it can be maintained in highly dynamic networks in $O(1)$-amortized rounds.
We complement the above with several impossibility results. We show that membership listing of any other graph on $k\geq 3$ nodes except $k$-clique requires an almost linear number of amortized communication rounds. We also show that $k$-cycle listing for $k\geq 6$ requires $Ω(\sqrt{n} / \log n)$ amortized rounds. This, combined with our upper bounds, paints a detailed picture of the complexity landscape for ultra fast graph finding algorithms in this highly dynamic environment.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Improved Distributed Approximations for Maximum Independent Set
Authors:
Ken-ichi Kawarabayashi,
Seri Khoury,
Aaron Schild,
Gregory Schwartzman
Abstract:
We present improved results for approximating maximum-weight independent set ($\MaxIS$) in the CONGEST and LOCAL models of distributed computing. Given an input graph, let $n$ and $Δ$ be the number of nodes and maximum degree, respectively, and let $\MIS(n,Δ)$ be the the running time of finding a \emph{maximal} independent set ($\MIS$) in the CONGEST model. Bar-Yehuda et al. [PODC 2017] showed tha…
▽ More
We present improved results for approximating maximum-weight independent set ($\MaxIS$) in the CONGEST and LOCAL models of distributed computing. Given an input graph, let $n$ and $Δ$ be the number of nodes and maximum degree, respectively, and let $\MIS(n,Δ)$ be the the running time of finding a \emph{maximal} independent set ($\MIS$) in the CONGEST model. Bar-Yehuda et al. [PODC 2017] showed that there is an algorithm in the CONGEST model that finds a $Δ$-approximation for $\MaxIS$ in $O(\MIS(n,Δ)\log W)$ rounds, where $W$ is the maximum weight of a node in the graph, which can be as high as $\poly (n)$. Whether their algorithm is deterministic or randomized depends on the $\MIS$ algorithm that is used as a black-box.
Our main result in this work is a randomized $(\poly(\log\log n)/ε)$-round algorithm that finds, with high probability, a $(1+ε)Δ$-approximation for $\MaxIS$ in the CONGEST model. That is, by sacrificing only a tiny fraction of the approximation guarantee, we achieve an \emph{exponential} speed-up in the running time over the previous best known result. Due to a lower bound of $Ω(\sqrt{\log n/\log \log n})$ that was given by Kuhn, Moscibroda and Wattenhofer [JACM, 2016] on the number of rounds for any (possibly randomized) algorithm that finds a maximal independent set (even in the LOCAL model) this result implies that finding a $(1+ε)Δ$-approximation for $\MaxIS$ is exponentially easier than $\MIS$.
△ Less
Submitted 19 February, 2020; v1 submitted 27 June, 2019;
originally announced June 2019.
-
Optimal Distributed Covering Algorithms
Authors:
Ran Ben-Basat,
Guy Even,
Ken-ichi Kawarabayashi,
Gregory Schwartzman
Abstract:
We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank $f$. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by $f$. The approximation factor of our algorithm is $(f+ε)$. Our algorithm runs in the CONGEST model and requires $O(\logΔ/ \log\logΔ)$ rounds,…
▽ More
We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank $f$. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by $f$. The approximation factor of our algorithm is $(f+ε)$. Our algorithm runs in the CONGEST model and requires $O(\logΔ/ \log\logΔ)$ rounds, for constants $ε\in(0,1]$ and $f\in N^+$. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights nor the number of vertices. For constant values of $f$ and $ε$, our algorithm improves over the $(f+ε)$-approximation algorithm of KMW06 whose running time is $O(\log Δ+ \log W)$, where $W$ is the ratio between the largest and smallest vertex weights in the graph. Our algorithm also achieves an $f$-approximation for the problem in $O(f\log n)$ rounds, improving over the classical result of KVY94 that achieves a running time of $O(f\log^2 n)$. Finally, for weighted vertex cover ($f=2$) our algorithm achieves a \emph{deterministic} running time of $O(\log n)$, matching the \emph{randomized} previously best result of KY11. We also show that integer covering-programs can be reduced to the Minimum Weight Set Cover problem in the distributed setting. This allows us to achieve an $(f+ε)$-approximate integral solution in $O(\frac{\logΔ}{\log\logΔ}+(f\cdot\log M)^{1.01}\logε^{-1}(\logΔ)^{0.01})$ rounds, where $f$ bounds the number of variables in a constraint, $Δ$ bounds the number of constraints a variable appears in, and $M=\max \{1, 1/a_{\min}\}$, where $a_{min}$ is the smallest normalized constraint coefficient. This improves over the results of KMW06 for the integral case, which runs in $O(ε^{-4}\cdot f^4\cdot \log f\cdot\log(M\cdotΔ))$ rounds.
△ Less
Submitted 30 May, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Fast Deterministic Algorithms for Highly-Dynamic Networks
Authors:
Keren Censor-Hillel,
Neta Dafni,
Victor I. Kolobov,
Ami Paz,
Gregory Schwartzman
Abstract:
This paper provides an algorithmic framework for obtaining fast distributed algorithms for a highly-dynamic setting, in which *arbitrarily many* edge changes may occur in each round. Our algorithm significantly improves upon prior work in its combination of (1) having an $O(1)$ amortized time complexity, (2) using only $O(\log{n})$-bit messages, (3) not posing any restrictions on the dynamic behav…
▽ More
This paper provides an algorithmic framework for obtaining fast distributed algorithms for a highly-dynamic setting, in which *arbitrarily many* edge changes may occur in each round. Our algorithm significantly improves upon prior work in its combination of (1) having an $O(1)$ amortized time complexity, (2) using only $O(\log{n})$-bit messages, (3) not posing any restrictions on the dynamic behavior of the environment, (4) being deterministic, (5) having strong guarantees for intermediate solutions, and (6) being applicable for a wide family of tasks.
The tasks for which we deduce such an algorithm are maximal matching, $(degree+1)$-coloring, 2-approximation for minimum weight vertex cover, and maximal independent set (which is the most subtle case). For some of these tasks, node insertions can also be among the allowed topology changes, and for some of them also abrupt node deletions.
△ Less
Submitted 11 October, 2020; v1 submitted 13 January, 2019;
originally announced January 2019.
-
Optimal Distributed Weighted Set Cover Approximation
Authors:
Ran Ben-Basat,
Guy Even,
Ken-ichi Kawarabayashi,
Gregory Schwartzman
Abstract:
We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank $f$. This problem is equivalent to the Minimum Weight Set Cover Problem in which the frequency of every element is bounded by $f$. The approximation factor of our algorithm is $(f+ε)$. Let $Δ$ denote the maximum degree in the hypergraph. Our algorithm runs in the CON…
▽ More
We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank $f$. This problem is equivalent to the Minimum Weight Set Cover Problem in which the frequency of every element is bounded by $f$. The approximation factor of our algorithm is $(f+ε)$. Let $Δ$ denote the maximum degree in the hypergraph. Our algorithm runs in the CONGEST model and requires $O(\logΔ / \log \log Δ)$ rounds, for constants $ε\in (0,1]$ and $f\in\mathbb N^+$. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights or the number of vertices. Thus adding another member to the exclusive family of \emph{provably optimal} distributed algorithms.
For constant values of $f$ and $ε$, our algorithm improves over the $(f+ε)$-approximation algorithm of \cite{KuhnMW06} whose running time is $O(\log Δ+ \log W)$, where $W$ is the ratio between the largest and smallest vertex weights in the graph.
△ Less
Submitted 17 August, 2018;
originally announced August 2018.
-
Parameterized Distributed Algorithms
Authors:
Ran Ben-Basat,
Ken-ichi Kawarabayashi,
Gregory Schwartzman
Abstract:
In this work, we initiate a thorough study of parameterized graph optimization problems in the distributed setting. In a parameterized problem, an algorithm decides whether a solution of size bounded by a \emph{parameter} $k$ exists and if so, it finds one. We study fundamental problems, including Minimum Vertex Cover (MVC), Maximum Independent Set (MaxIS), Maximum Matching (MaxM), and many others…
▽ More
In this work, we initiate a thorough study of parameterized graph optimization problems in the distributed setting. In a parameterized problem, an algorithm decides whether a solution of size bounded by a \emph{parameter} $k$ exists and if so, it finds one. We study fundamental problems, including Minimum Vertex Cover (MVC), Maximum Independent Set (MaxIS), Maximum Matching (MaxM), and many others, in both the LOCAL and CONGEST distributed computation models. We present lower bounds for the round complexity of solving parameterized problems in both models, together with optimal and near-optimal upper bounds.
Our results extend beyond the scope of parameterized problems. We show that any LOCAL $(1+ε)$-approximation algorithm for the above problems must take $Ω(ε^{-1})$ rounds. Joined with the algorithm of [GKM17] and the $Ω(\sqrt{\frac{\log n}{\log\log n}})$ lower bound of [KMW16], this settles the complexity of $(1+ε)$-approximating MVC, MaxM and MaxIS at $(ε^{-1}\log n)^{Θ(1)}$. We also show that our parameterized approach reduces the runtime of exact and approximate CONGEST algorithms for MVC and MaxM if the optimal solution is small, without knowing its size beforehand. Finally, we propose the first deterministic $o(n^2)$ rounds CONGEST algorithms that approximate MVC and MaxM within a factor strictly smaller than $2$.
△ Less
Submitted 5 August, 2018; v1 submitted 12 July, 2018;
originally announced July 2018.
-
A Deterministic Distributed $2$-Approximation for Weighted Vertex Cover in $O(\log n\logΔ/ \log^2\logΔ)$ Rounds
Authors:
Ran Ben-Basat,
Guy Even,
Ken-ichi Kawarabayashi,
Gregory Schwartzman
Abstract:
We present a deterministic distributed $2$-approximation algorithm for the Minimum Weight Vertex Cover problem in the CONGEST model whose round complexity is $O(\log n \log Δ/ \log^2 \log Δ)$. This improves over the currently best known deterministic 2-approximation implied by [KVY94]. Our solution generalizes the $(2+ε)$-approximation algorithm of [BCS17], improving the dependency on $ε^{-1}$ fro…
▽ More
We present a deterministic distributed $2$-approximation algorithm for the Minimum Weight Vertex Cover problem in the CONGEST model whose round complexity is $O(\log n \log Δ/ \log^2 \log Δ)$. This improves over the currently best known deterministic 2-approximation implied by [KVY94]. Our solution generalizes the $(2+ε)$-approximation algorithm of [BCS17], improving the dependency on $ε^{-1}$ from linear to logarithmic. In addition, for every $ε=(\log Δ)^{-c}$, where $c\geq 1$ is a constant, our algorithm computes a $(2+ε)$-approximation in $O(\log Δ/ \log \log Δ)$~rounds (which is asymptotically optimal).
△ Less
Submitted 23 May, 2018; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Adapting Local Sequential Algorithms to the Distributed Setting
Authors:
Ken-ichi Kawarabayashi,
Gregory Schwartzman
Abstract:
It is a well known fact that sequential algorithms which exhibit a strong "local" nature can be adapted to the distributed setting given a legal graph coloring. The running time of the distributed algorithm will then be at least the number of colors. Surprisingly, this well known idea was never formally stated as a unified framework. In this paper we aim to define a robust family of local sequenti…
▽ More
It is a well known fact that sequential algorithms which exhibit a strong "local" nature can be adapted to the distributed setting given a legal graph coloring. The running time of the distributed algorithm will then be at least the number of colors. Surprisingly, this well known idea was never formally stated as a unified framework. In this paper we aim to define a robust family of local sequential algorithms which can be easily adapted to the distributed setting. We then develop new tools to further enhance these algorithms, achieving state of the art results for fundamental problems.
We define a simple class of greedy-like algorithms which we call \emph{orderless-local} algorithms. We show that given a legal $c$-coloring of the graph, every algorithm in this family can be converted into a distributed algorithm running in $O(c)$ communication rounds in the CONGEST model. We show that this family is indeed robust as both the method of conditional expectations and the unconstrained submodular maximization algorithm of Buchbinder \etal \cite{BuchbinderFNS15} can be expressed as orderless-local algorithms for \emph{local utility functions} --- Utility functions which have a strong local nature to them.
We use the above algorithms as a base for new distributed approximation algorithms for the weighted variants of some fundamental problems: Max $k$-Cut, Max-DiCut, Max 2-SAT and correlation clustering. We develop algorithms which have the same approximation guarantees as their sequential counterparts, up to a constant additive $ε$ factor, while achieving an $O(\log^* n)$ running time for deterministic algorithms and $O(ε^{-1})$ running time for randomized ones. This improves exponentially upon the currently best known algorithms.
△ Less
Submitted 12 May, 2018; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Distributed Approximation of Maximum Independent Set and Maximum Matching
Authors:
Reuven Bar-Yehuda,
Keren Censor-Hillel,
Mohsen Ghaffari,
Gregory Schwartzman
Abstract:
We present a simple distributed $Δ$-approximation algorithm for maximum weight independent set (MaxIS) in the $\mathsf{CONGEST}$ model which completes in $O(\texttt{MIS}(G)\cdot \log W)$ rounds, where $Δ$ is the maximum degree, $\texttt{MIS}(G)$ is the number of rounds needed to compute a maximal independent set (MIS) on $G$, and $W$ is the maximum weight of a node. %Whether our algorithm is rando…
▽ More
We present a simple distributed $Δ$-approximation algorithm for maximum weight independent set (MaxIS) in the $\mathsf{CONGEST}$ model which completes in $O(\texttt{MIS}(G)\cdot \log W)$ rounds, where $Δ$ is the maximum degree, $\texttt{MIS}(G)$ is the number of rounds needed to compute a maximal independent set (MIS) on $G$, and $W$ is the maximum weight of a node. %Whether our algorithm is randomized or deterministic depends on the \texttt{MIS} algorithm used as a black-box.
Plugging in the best known algorithm for MIS gives a randomized solution in $O(\log n \log W)$ rounds, where $n$ is the number of nodes.
We also present a deterministic $O(Δ+\log^* n)$-round algorithm based on coloring.
We then show how to use our MaxIS approximation algorithms to compute a $2$-approximation for maximum weight matching without incurring any additional round penalty in the $\mathsf{CONGEST}$ model. We use a known reduction for simulating algorithms on the line graph while incurring congestion, but we show our algorithm is part of a broad family of \emph{local aggregation algorithms} for which we describe a mechanism that allows the simulation to run in the $\mathsf{CONGEST}$ model without an additional overhead.
Next, we show that for maximum weight matching, relaxing the approximation factor to ($2+\varepsilon$) allows us to devise a distributed algorithm requiring $O(\frac{\log Δ}{\log\logΔ})$ rounds for any constant $\varepsilon>0$. For the unweighted case, we can even obtain a $(1+\varepsilon)$-approximation in this number of rounds. These algorithms are the first to achieve the provably optimal round complexity with respect to dependency on $Δ$.
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
A $(2+ε)$-Approximation for Maximum Weight Matching in the Semi-Streaming Model
Authors:
Ami Paz,
Gregory Schwartzman
Abstract:
We present a simple deterministic single-pass $(2+ε)$-approximation algorithm for the maximum weight matching problem in the semi-streaming model. This improves upon the currently best known approximation ratio of $(4+ε)$.
Our algorithm uses $O(n\log^2 n)$ bits of space for constant values of $ε$. It relies on a variation of the local-ratio theorem, which may be of use for other algorithms in th…
▽ More
We present a simple deterministic single-pass $(2+ε)$-approximation algorithm for the maximum weight matching problem in the semi-streaming model. This improves upon the currently best known approximation ratio of $(4+ε)$.
Our algorithm uses $O(n\log^2 n)$ bits of space for constant values of $ε$. It relies on a variation of the local-ratio theorem, which may be of use for other algorithms in the semi-streaming model as well.
△ Less
Submitted 5 November, 2018; v1 submitted 15 February, 2017;
originally announced February 2017.
-
Derandomizing Local Distributed Algorithms under Bandwidth Restrictions
Authors:
Keren Censor-Hillel,
Merav Parter,
Gregory Schwartzman
Abstract:
This paper addresses the cornerstone family of \emph{local problems} in distributed computing, and investigates the curious gap between randomized and deterministic solutions under bandwidth restrictions.
Our main contribution is in providing tools for derandomizing solutions to local problems, when the $n$ nodes can only send $O(\log n)$-bit messages in each round of communication. We combine b…
▽ More
This paper addresses the cornerstone family of \emph{local problems} in distributed computing, and investigates the curious gap between randomized and deterministic solutions under bandwidth restrictions.
Our main contribution is in providing tools for derandomizing solutions to local problems, when the $n$ nodes can only send $O(\log n)$-bit messages in each round of communication. We combine bounded independence, which we show to be sufficient for some algorithms, with the method of conditional expectations and with additional machinery, to obtain the following results.
Our techniques give a deterministic maximal independent set (MIS) algorithm in the CONGEST model, where the communication graph is identical to the input graph, in $O(D\log^2 n)$ rounds, where $D$ is the diameter of the graph. The best known running time in terms of $n$ alone is $2^{O(\sqrt{\log n})}$, which is super-polylogarithmic, and requires large messages. For the CONGEST model, the only known previous solution is a coloring-based $O(Δ+ \log^* n)$-round algorithm, where $Δ$ is the maximal degree in the graph.
On the way to obtaining the above, we show that in the \emph{Congested Clique} model, which allows all-to-all communication, there is a deterministic MIS algorithm that runs in $O(\log Δ\log n)$ rounds.%, where $Δ$ is the maximum degree. When $Δ=O(n^{1/3})$, the bound improves to $O(\log Δ)$ and holds also for $(Δ+1)$-coloring.
In addition, we deterministically construct a $(2k-1)$-spanner with $O(kn^{1+1/k}\log n)$ edges in $O(k \log n)$ rounds. For comparison, in the more stringent CONGEST model, the best deterministic algorithm for constructing a $(2k-1)$-spanner with $O(kn^{1+1/k})$ edges runs in $O(n^{1-1/k})$ rounds.
△ Less
Submitted 4 August, 2016;
originally announced August 2016.
-
Fast Distributed Algorithms for Testing Graph Properties
Authors:
Keren Censor-Hillel,
Eldar Fischer,
Gregory Schwartzman,
Yadu Vasudev
Abstract:
We initiate a thorough study of \emph{distributed property testing} -- producing algorithms for the approximation problems of property testing in the CONGEST model. In particular, for the so-called \emph{dense} testing model we emulate sequential tests for nearly all graph properties having $1$-sided tests, while in the \emph{sparse} and \emph{general} models we obtain faster tests for triangle-fr…
▽ More
We initiate a thorough study of \emph{distributed property testing} -- producing algorithms for the approximation problems of property testing in the CONGEST model. In particular, for the so-called \emph{dense} testing model we emulate sequential tests for nearly all graph properties having $1$-sided tests, while in the \emph{sparse} and \emph{general} models we obtain faster tests for triangle-freeness and bipartiteness respectively.
In most cases, aided by parallelism, the distributed algorithms have a much shorter running time as compared to their counterparts from the sequential querying model of traditional property testing. The simplest property testing algorithms allow a relatively smooth transitioning to the distributed model. For the more complex tasks we develop new machinery that is of independent interest. This includes a method for distributed maintenance of multiple random walks.
△ Less
Submitted 2 May, 2016; v1 submitted 11 February, 2016;
originally announced February 2016.
-
A Distributed $(2+ε)$-Approximation for Vertex Cover in $O(\logΔ/ε\log\logΔ)$ Rounds
Authors:
Reuven Bar-Yehuda,
Keren Censor-Hillel,
Gregory Schwartzman
Abstract:
We present a simple deterministic distributed $(2+ε)$-approximation algorithm for minimum weight vertex cover, which completes in $O(\logΔ/ε\log\logΔ)$ rounds, where $Δ$ is the maximum degree in the graph, for any $ε>0$ which is at most $O(1)$. For a constant $ε$, this implies a constant approximation in $O(\logΔ/\log\logΔ)$ rounds, which contradicts the lower bound of [KMW10].
We present a simple deterministic distributed $(2+ε)$-approximation algorithm for minimum weight vertex cover, which completes in $O(\logΔ/ε\log\logΔ)$ rounds, where $Δ$ is the maximum degree in the graph, for any $ε>0$ which is at most $O(1)$. For a constant $ε$, this implies a constant approximation in $O(\logΔ/\log\logΔ)$ rounds, which contradicts the lower bound of [KMW10].
△ Less
Submitted 12 February, 2016; v1 submitted 11 February, 2016;
originally announced February 2016.