-
Deferred Objects to Enhance Smart Contract Programming with Optimistic Parallel Execution
Authors:
George Mitenkov,
Igor Kabiljo,
Zekun Li,
Alexander Spiegelman,
Satyanarayana Vusirikala,
Zhuolun Xiang,
Aleksandar Zlateski,
Nuno P. Lopes,
Rati Gelashvili
Abstract:
One of the main bottlenecks of blockchains is smart contract execution. To increase throughput, modern blockchains try to execute transactions in parallel. Unfortunately, however, common blockchain use cases introduce read-write conflicts between transactions, forcing sequentiality.
We propose RapidLane, an extension for parallel execution engines that allows the engine to capture computations i…
▽ More
One of the main bottlenecks of blockchains is smart contract execution. To increase throughput, modern blockchains try to execute transactions in parallel. Unfortunately, however, common blockchain use cases introduce read-write conflicts between transactions, forcing sequentiality.
We propose RapidLane, an extension for parallel execution engines that allows the engine to capture computations in conflicting parts of transactions and defer their execution until a later time, sometimes optimistically predicting execution results. This technique, coupled with support for a new construct for smart contract languages, allows one to turn certain sequential workloads into parallelizable ones.
We integrated RapidLane into Block-STM, a state-of-the-art parallel execution engine used by several blockchains in production, and deployed it on the Aptos blockchain. Our evaluation shows that on commonly contended workloads, such as peer-to-peer transfers with a single fee payer and NFT minting, RapidLane yields up to $12\times$ more throughput.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Shoal: Improving DAG-BFT Latency And Robustness
Authors:
Alexander Spiegelman,
Balaji Arun,
Rati Gelashvili,
Zekun Li
Abstract:
The Narwhal system is a state-of-the-art Byzantine fault-tolerant scalable architecture that involves constructing a directed acyclic graph (DAG) of messages among a set of validators in a Blockchain network. Bullshark is a zero-overhead consensus protocol on top of the Narwhal's DAG that can order over 100k transactions per second. Unfortunately, the high throughput of Bullshark comes with a late…
▽ More
The Narwhal system is a state-of-the-art Byzantine fault-tolerant scalable architecture that involves constructing a directed acyclic graph (DAG) of messages among a set of validators in a Blockchain network. Bullshark is a zero-overhead consensus protocol on top of the Narwhal's DAG that can order over 100k transactions per second. Unfortunately, the high throughput of Bullshark comes with a latency price due to the DAG construction, increasing the latency compared to the state-of-the-art leader-based BFT consensus protocols.
We introduce Shoal, a protocol-agnostic framework for enhancing Narwhal-based consensus. By incorporating leader reputation and pipelining support for the first time, Shoal significantly reduces latency. Moreover, the combination of properties of the DAG construction and the leader reputation mechanism enables the elimination of timeouts in all but extremely uncommon scenarios in practice, a property we name Prevalent Responsiveness" (it strictly subsumes the established and often desired Optimistic Responsiveness property for BFT protocols).
We integrated Shoal instantiated with Bullshark, the fastest existing Narwhal-based consensus protocol, in an open-source Blockchain project and provide experimental evaluations demonstrating up to 40% latency reduction in the failure-free executions, and up-to 80% reduction in executions with failures against the vanilla Bullshark implementation.
△ Less
Submitted 7 July, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing
Authors:
Rati Gelashvili,
Alexander Spiegelman,
Zhuolun Xiang,
George Danezis,
Zekun Li,
Dahlia Malkhi,
Yu Xia,
Runtian Zhou
Abstract:
Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect depen…
▽ More
Block-STM is a parallel execution engine for smart contracts, built around the principles of Software Transactional Memory. Transactions are grouped in blocks, and every execution of the block must yield the same deterministic outcome. Block-STM further enforces that the outcome is consistent with executing transactions according to a preset order, leveraging this order to dynamically detect dependencies and avoid conflicts during speculative transaction execution. At the core of Block-STM is a novel, low-overhead collaborative scheduler of execution and validation tasks.
Block-STM is implemented on the main branch of the Diem Blockchain code-base and runs in production at Aptos. Our evaluation demonstrates that Block-STM is adaptive to workloads with different conflict rates and utilizes the inherent parallelism therein. Block-STM achieves up to $110k$ tps in the Diem benchmarks and up to $170k$ tps in the Aptos Benchmarks, which is a $20$x and $17$x improvement over the sequential baseline with $32$ threads, respectively. The throughput on a contended workload is up to $50k$ tps and $80k$ tps in Diem and Aptos benchmarks, respectively.
△ Less
Submitted 25 August, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Be Aware of Your Leaders
Authors:
Shir Cohen,
Rati Gelashvili,
Lefteris Kokoris Kogias,
Zekun Li,
Dahlia Malkhi,
Alberto Sonnino,
Alexander Spiegelman
Abstract:
Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this…
▽ More
Advances in blockchains have influenced the State-Machine-Replication (SMR) world and many state-of-the-art blockchain-SMR solutions are based on two pillars: Chaining and Leader-rotation. A predetermined round-robin mechanism used for Leader-rotation, however, has an undesirable behavior: crashed parties become designated leaders infinitely often, slowing down overall system performance. In this paper, we provide a new Leader-Aware SMR framework that, among other desirable properties, formalizes a Leader-utilization requirement that bounds the number of rounds whose leaders are faulty in crash-only executions. We introduce Carousel, a novel, reputation-based Leader-rotation solution to achieve Leader-Aware SMR. The challenge in adaptive Leader-rotation is that it cannot rely on consensus to determine a leader, since consensus itself needs a leader. Carousel uses the available on-chain information to determine a leader locally and achieves Liveness despite this difficulty. A HotStuff implementation fitted with Carousel demonstrates drastic performance improvements: it increases throughput over 2x in faultless settings and provided a 20x throughput increase and 5x latency reduction in the presence of faults.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Lower Bounds for Shared-Memory Leader Election under Bounded Write Contention
Authors:
Dan Alistarh,
Rati Gelashvili,
Giorgi Nadiradze
Abstract:
This paper gives tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous shared-memory model with single-writer multi-reader (SWMR) registers, for randomized obstruction-free algorithms.
The approach extends to lower bounds for randomized obstruction-free algorithms using multi-writer registers under bounded write concurrency, showing a trade-off between…
▽ More
This paper gives tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous shared-memory model with single-writer multi-reader (SWMR) registers, for randomized obstruction-free algorithms.
The approach extends to lower bounds for randomized obstruction-free algorithms using multi-writer registers under bounded write concurrency, showing a trade-off between the solo step complexity of a leader election algorithm, and the worst-case contention incurred by a processor in an execution.
△ Less
Submitted 25 March, 2022; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback
Authors:
Rati Gelashvili,
Lefteris Kokoris-Kogias,
Alberto Sonnino,
Alexander Spiegelman,
Zhuolun Xiang
Abstract:
Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarran…
▽ More
Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarranted since existing linear protocols still have asymptotic quadratic cost in the worst case. We design Ditto, a Byzantine SMR protocol that enjoys the best of both worlds: optimal communication on and off the happy path (linear and quadratic, respectively) and progress guarantee under asynchrony and DDoS attacks. We achieve this by replacing the view-synchronization of partially synchronous protocols with an asynchronous fallback mechanism at no extra asymptotic cost. Specifically, we start from HotStuff, a state-of-the-art linear protocol, and gradually build Ditto. As a separate contribution and an intermediate step, we design a 2-chain version of HotStuff, Jolteon, which leverages a quadratic view-change mechanism to reduce the latency of the standard 3-chain HotStuff. We implement and experimentally evaluate all our systems. Notably, Jolteon's commit latency outperforms HotStuff by 200-300ms with varying system size. Additionally, Ditto adapts to the network and provides better performance than Jolteon under faulty conditions and better performance than VABA (a state-of-the-art asynchronous protocol) under faultless conditions. This proves our case that breaking the robustness-efficiency trade-off is in the realm of practicality.
△ Less
Submitted 9 July, 2024; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Be Prepared When Network Goes Bad: An Asynchronous View-Change Protocol
Authors:
Rati Gelashvili,
Lefteris Kokoris-Kogias,
Alexander Spiegelman,
Zhuolun Xiang
Abstract:
The popularity of permissioned blockchain systems demands BFT SMR protocols that are efficient under good network conditions (synchrony) and robust under bad network conditions (asynchrony). The state-of-the-art partially synchronous BFT SMR protocols provide optimal linear communication cost per decision under synchrony and good leaders, but lose liveness under asynchrony. On the other hand, the…
▽ More
The popularity of permissioned blockchain systems demands BFT SMR protocols that are efficient under good network conditions (synchrony) and robust under bad network conditions (asynchrony). The state-of-the-art partially synchronous BFT SMR protocols provide optimal linear communication cost per decision under synchrony and good leaders, but lose liveness under asynchrony. On the other hand, the state-of-the-art asynchronous BFT SMR protocols are live even under asynchrony, but always pay quadratic cost even under synchrony. In this paper, we propose a BFT SMR protocol that achieves the best of both worlds -- optimal linear cost per decision under good networks and leaders, optimal quadratic cost per decision under bad networks, and remains always live.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Fast Graphical Population Protocols
Authors:
Dan Alistarh,
Rati Gelashvili,
Joel Rybicki
Abstract:
Let $G$ be a graph on $n$ nodes. In the stochastic population protocol model, a collection of $n$ indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each other's states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of…
▽ More
Let $G$ be a graph on $n$ nodes. In the stochastic population protocol model, a collection of $n$ indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each other's states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of fundamental tasks, such as majority and leader election, in this model, when $G$ is a clique. Specifically, in the clique, these tasks can be solved fast, i.e., in $n \operatorname{polylog} n$ pairwise interactions, with high probability, using at most $\operatorname{polylog} n$ states per node.
In this work, we consider the more general setting where $G$ is an arbitrary graph, and present a technique for simulating protocols designed for fully-connected networks in any connected regular graph. Our main result is a simulation that is efficient on many interesting graph families: roughly, the simulation overhead is polylogarithmic in the number of nodes, and quadratic in the conductance of the graph. As a sample application, we show that, in any regular graph with conductance $φ$, both leader election and exact majority can be solved in $φ^{-2} \cdot n \operatorname{polylog} n$ pairwise interactions, with high probability, using at most $φ^{-2} \cdot \operatorname{polylog} n$ states per node. This shows that there are fast and space-efficient population protocols for leader election and exact majority on graphs with good expansion properties. We believe our results will prove generally useful, as they allow efficient technology transfer between the well-mixed (clique) case, and the under-explored spatial setting.
△ Less
Submitted 11 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
L3 Fusion: Fast Transformed Convolutions on CPUs
Authors:
Rati Gelashvili,
Nir Shavit,
Aleksandar Zlateski
Abstract:
Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well--designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous…
▽ More
Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well--designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous amount of shared L3 cache present on modern CPUs is often neglected, and the algorithms are optimized solely for the private L2 cache. In this paper we propose an efficient `L3 Fusion` algorithm that is specifically designed for CPUs with significant amount of shared L3 cache. Using the hierarchical roofline model, we show that in many cases, especially for layers with fewer channels, the `L3 fused` approach can greatly outperform standard 3 stage one provided by big vendors such as Intel. We validate our theoretical findings, by benchmarking our `L3 fused` implementation against publicly available state of the art.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
Why Extension-Based Proofs Fail
Authors:
Dan Alistarh,
James Aspnes,
Faith Ellen,
Rati Gelashvili,
Leqi Zhu
Abstract:
We introduce extension-based proofs, a class of impossibility proofs that includes valency arguments. They are modelled as an interaction between a prover and a protocol. Using proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve k-set agreement among n > k > 1 processes in a wait-free manner in certain asynchronous models. However, it was unkn…
▽ More
We introduce extension-based proofs, a class of impossibility proofs that includes valency arguments. They are modelled as an interaction between a prover and a protocol. Using proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve k-set agreement among n > k > 1 processes in a wait-free manner in certain asynchronous models. However, it was unknown whether proofs based on simpler techniques were possible. We show that this impossibility result cannot be obtained for one of these models by an extension-based proof and, hence, extension-based proofs are limited in power.
△ Less
Submitted 2 August, 2020; v1 submitted 4 November, 2018;
originally announced November 2018.
-
Revisionist Simulations: A New Approach to Proving Space Lower Bounds
Authors:
Faith Ellen,
Rati Gelashvili,
Leqi Zhu
Abstract:
Determining the space complexity of $x$-obstruction-free $k$-set agreement for $x\leq k$ is an open problem. In $x$-obstruction-free protocols, processes are required to return in executions where at most $x$ processes take steps. The best known upper bound on the number of registers needed to solve this problem among $n>k$ processes is $n-k+x$ registers. No general lower bound better than $2$ was…
▽ More
Determining the space complexity of $x$-obstruction-free $k$-set agreement for $x\leq k$ is an open problem. In $x$-obstruction-free protocols, processes are required to return in executions where at most $x$ processes take steps. The best known upper bound on the number of registers needed to solve this problem among $n>k$ processes is $n-k+x$ registers. No general lower bound better than $2$ was known.
We prove that any $x$-obstruction-free protocol solving $k$-set agreement among $n>k$ processes uses at least $\lfloor(n-x)/(k+1-x)\rfloor+1$ registers. Our main tool is a simulation that serves as a reduction from the impossibility of deterministic wait-free $k$-set agreement: if a protocol uses fewer registers, then it is possible for $k+1$ processes to simulate the protocol and deterministically solve $k$-set agreement in a wait-free manner, which is impossible. A critical component of the simulation is the ability of simulating processes to revise the past of simulated processes. We introduce a new augmented snapshot object, which facilitates this.
We also prove that any space lower bound on the number of registers used by obstruction-free protocols applies to protocols that satisfy nondeterministic solo termination. Hence, our lower bound of $\lfloor(n-1)/k\rfloor+1$ for the obstruction-free ($x=1$) case also holds for randomized wait-free free protocols. In particular, this gives a tight lower bound of exactly $n$ registers for solving obstruction-free and randomized wait-free consensus.
Finally, our new techniques can be applied to get a space lower of $\lfloor n/2\rfloor+1$ for $ε$-approximate agreement, for sufficiently small $ε$. It requires participating processes to return values within $ε$ of each other. The best known upper bounds are $\lceil\log(1/ε)\rceil$ and $n$, while no general lower bounds were known.
△ Less
Submitted 10 October, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Towards Reduced Instruction Sets for Synchronization
Authors:
Rati Gelashvili,
Idit Keidar,
Alexander Spiegelman,
Roger Wattenhofer
Abstract:
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily re…
▽ More
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily rely on compare-and-swap (e.g. for swinging pointers and in general, conflict resolution).
We show that this need not be the case, by designing and implementing a concurrent linearizable Log data-structure (also known as a History object), supporting two operations: append(item), which appends the item to the log, and get-log(), which returns the appended items so far, in order. Readers are wait-free and writers are lock-free, and this data-structure can be used in a lock-free universal construction to implement any concurrent object with a given sequential specification. Our implementation uses atomic read, xor, decrement, and fetch-and-increment instructions supported on X86 architectures, and provides similar performance to a compare-and-swap-based solution on today's hardware. This raises a fundamental question about minimal set of synchronization instructions that the architectures have to support.
△ Less
Submitted 8 May, 2017;
originally announced May 2017.
-
Space-Optimal Majority in Population Protocols
Authors:
Dan Alistarh,
James Aspnes,
Rati Gelashvili
Abstract:
Population protocols are a model of distributed computing, in which $n$ agents with limited local state interact randomly, and cooperate to collectively compute global predicates. An extensive series of papers, across different communities, has examined the computability and complexity characteristics of this model. Majority, or consensus, is a central task, in which agents need to collectively re…
▽ More
Population protocols are a model of distributed computing, in which $n$ agents with limited local state interact randomly, and cooperate to collectively compute global predicates. An extensive series of papers, across different communities, has examined the computability and complexity characteristics of this model. Majority, or consensus, is a central task, in which agents need to collectively reach a decision as to which one of two states $A$ or $B$ had a higher initial count. Two complexity metrics are important: the time that a protocol requires to stabilize to an output decision, and the state space size that each agent requires.
It is known that majority requires $Ω(\log \log n)$ states per agent to allow for poly-logarithmic time stabilization, and that $O(\log^2 n)$ states are sufficient. Thus, there is an exponential gap between the upper and lower bounds.
We address this question. We provide a new lower bound of $Ω(\log n)$ states for any protocol which stabilizes in $O( n^{1-c} )$ time, for any $c > 0$ constant. This result is conditional on basic monotonicity and output assumptions, satisfied by all known protocols. Technically, it represents a significant departure from previous lower bounds. Instead of relying on dense configurations, we introduce a new surgery technique to construct executions which contradict the correctness of algorithms that stabilize too fast. Subsequently, our lower bound applies to general initial configurations.
We give an algorithm for majority which uses $O(\log n)$ states, and stabilizes in $O(\log^2 n)$ time. Central to the algorithm is a new leaderless phase clock, which allows nodes to synchronize in phases of $Θ(n \log{n})$ consecutive interactions using $O(\log n)$ states per node. We also employ our phase clock to build a leader election algorithm with $O(\log n )$ states, which stabilizes in $O(\log^2 n)$ time.
△ Less
Submitted 13 July, 2017; v1 submitted 17 April, 2017;
originally announced April 2017.
-
A Complexity-Based Hierarchy for Multiprocessor Synchronization
Authors:
Faith Ellen,
Rati Gelashvili,
Nir Shavit,
Leqi Zhu
Abstract:
For many years, Herlihy's elegant computability based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply sy…
▽ More
For many years, Herlihy's elegant computability based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply synchronization instructions to collections of arbitrary memory locations. We were surprised to realize that, when considering instructions applied to memory locations, the computability based hierarchy collapses. This leaves open the question of how to better capture the power of various synchronization instructions.
In this paper, we provide an approach to answering this question. We present a hierarchy of synchronization instructions, classified by their space complexity in solving obstruction-free consensus. Our hierarchy provides a classification of combinations of known instructions that seems to fit with our intuition of how useful some are in practice, while questioning the effectiveness of others. We prove an essentially tight characterization of the power of buffered read and write instructions.Interestingly, we show a similar result for multi-location atomic assignments.
△ Less
Submitted 3 May, 2018; v1 submitted 20 July, 2016;
originally announced July 2016.
-
Time-Space Trade-offs in Population Protocols
Authors:
Dan Alistarh,
James Aspnes,
David Eisenstat,
Rati Gelashvili,
Ronald L. Rivest
Abstract:
Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as lea…
▽ More
Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires stabilization to a single agent in a special "leader" state), and majority (in which agents must stabilize to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory available to each agent.
In this paper, we explore this trade-off and provide new upper and lower bounds for majority and leader election. First, we prove a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, our result implies that any protocol solving either of these tasks for $n$ agents using $O( \log \log n )$ states must take $Ω( n / \rm{polylog} n )$ expected time. This is the first result to characterize time complexity for protocols which employ super-constant number of states per node, and proves that fast, poly-logarithmic running times require protocols to have relatively large space costs.
On the positive side, we give algorithms showing that fast, poly-logarithmic stabilization time can be achieved using $O( \log^2 n )$ space per node, in the case of both tasks. Overall, our results highlight a time complexity separation between $O(\log \log n)$ and $Θ( \log^2 n )$ state space size for both majority and leader election in population protocols, and introduce new techniques, which should be applicable more broadly.
△ Less
Submitted 17 April, 2017; v1 submitted 25 February, 2016;
originally announced February 2016.
-
On the Optimal Space Complexity of Consensus for Anonymous Processes
Authors:
Rati Gelashvili
Abstract:
The optimal space complexity of consensus in shared memory is a decades-old open problem. For a system of $n$ processes, no algorithm is known that uses a sublinear number of registers. However, the best known lower bound due to Fich, Herlihy, and Shavit requires $Ω(\sqrt{n})$ registers.
The special symmetric case of the problem where processes are anonymous (run the same algorithm) has also att…
▽ More
The optimal space complexity of consensus in shared memory is a decades-old open problem. For a system of $n$ processes, no algorithm is known that uses a sublinear number of registers. However, the best known lower bound due to Fich, Herlihy, and Shavit requires $Ω(\sqrt{n})$ registers.
The special symmetric case of the problem where processes are anonymous (run the same algorithm) has also attracted attention. Even in this case, the best lower and upper bounds are still $Ω(\sqrt{n})$ and $O(n)$. Moreover, Fich, Herlihy, and Shavit first proved their lower bound for anonymous processes, and then extended it to the general case. As such, resolving the anonymous case might be a significant step towards understanding and solving the general problem.
In this work, we show that in a system of anonymous processes, any consensus algorithm satisfying nondeterministic solo termination has to use $Ω(n)$ read-write registers in some execution. This implies an $Ω(n)$ lower bound on the space complexity of deterministic obstruction-free and randomized wait-free consensus, matching the upper bound and closing the symmetric case of the open problem.
△ Less
Submitted 16 August, 2015; v1 submitted 22 June, 2015;
originally announced June 2015.
-
Polylogarithmic-Time Leader Election in Population Protocols Using Polylogarithmic States
Authors:
Dan Alistarh,
Rati Gelashvili
Abstract:
Population protocols are networks of finite-state agents, interacting randomly, and updating their states using simple rules. Despite their extreme simplicity, these systems have been shown to cooperatively perform complex computational tasks, such as simulating register machines to compute standard arithmetic functions. The election of a unique leader agent is a key requirement in such computatio…
▽ More
Population protocols are networks of finite-state agents, interacting randomly, and updating their states using simple rules. Despite their extreme simplicity, these systems have been shown to cooperatively perform complex computational tasks, such as simulating register machines to compute standard arithmetic functions. The election of a unique leader agent is a key requirement in such computational constructions. Yet, the fastest currently known population protocol for electing a leader only has linear stabilization time, and, it has recently been shown that no population protocol using a constant number of states per node may overcome this linear bound.
In this paper, we give the first population protocol for leader election with polylogarithmic stabilization time, using polylogarithmic memory states per node. The protocol structure is quite simple: each node has an associated value, and is either a leader (still in contention) or a minion (following some leader). A leader keeps incrementing its value and "defeats" other leaders in one-to-one interactions, and will drop from contention and become a minion if it meets a leader with higher value. Importantly, a leader also drops out if it meets a minion with higher absolute value. While these rules are quite simple, the proof that this algorithm achieves polylogarithmic stabilization time is non-trivial. In particular, the argument combines careful use of concentration inequalities with anti-concentration bounds, showing that the leaders' values become spread apart as the execution progresses, which in turn implies that straggling leaders get quickly eliminated. We complement our analysis with empirical results, showing that our protocol stabilizes extremely fast, even for large network sizes.
△ Less
Submitted 16 April, 2017; v1 submitted 19 February, 2015;
originally announced February 2015.
-
Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints
Authors:
Zeyuan Allen-Zhu,
Rati Gelashvili,
Silvio Micali,
Nir Shavit
Abstract:
Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so imple…
▽ More
Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so implementable JL matrix must be sign-consistent (i.e., all entries in a single column must be either all non-negative or all non-positive), and the fact that any given neuron connects to a relatively small subset of other neurons implies that the JL matrix had better be sparse.
We construct sparse JL matrices that are sign-consistent, and prove that our construction is essentially optimal. Our work answers a mathematical question that was triggered by earlier work and is necessary to justify the existence of JL compression in the brain, and emphasizes that inhibition is crucial if neurons are to perform efficient, correlation-preserving compression.
△ Less
Submitted 19 November, 2014;
originally announced November 2014.
-
How to Elect a Leader Faster than a Tournament
Authors:
Dan Alistarh,
Rati Gelashvili,
Adrian Vladu
Abstract:
The problem of electing a leader from among $n$ contenders is one of the fundamental questions in distributed computing. In its simplest formulation, the task is as follows: given $n$ processors, all participants must eventually return a win or lose indication, such that a single contender may win. Despite a considerable amount of work on leader election, the following question is still open: can…
▽ More
The problem of electing a leader from among $n$ contenders is one of the fundamental questions in distributed computing. In its simplest formulation, the task is as follows: given $n$ processors, all participants must eventually return a win or lose indication, such that a single contender may win. Despite a considerable amount of work on leader election, the following question is still open: can we elect a leader in an asynchronous fault-prone system faster than just running a $Θ(\log n)$-time tournament, against a strong adaptive adversary?
In this paper, we answer this question in the affirmative, improving on a decades-old upper bound. We introduce two new algorithmic ideas to reduce the time complexity of electing a leader to $O(\log^* n)$, using $O(n^2)$ point-to-point messages. A non-trivial application of our algorithm is a new upper bound for the tight renaming problem, assigning $n$ items to the $n$ participants in expected $O(\log^2 n)$ time and $O(n^2)$ messages. We complement our results with lower bound of $Ω(n^2)$ messages for solving these two problems, closing the question of their message complexity.
△ Less
Submitted 15 February, 2015; v1 submitted 4 November, 2014;
originally announced November 2014.
-
On the Importance of Registers for Computability
Authors:
Rati Gelashvili,
Mohsen Ghaffari,
Jerry Li,
Nir Shavit
Abstract:
All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers?
This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate th…
▽ More
All consensus hierarchies in the literature assume that we have, in addition to copies of a given object, an unbounded number of registers. But why do we really need these registers?
This paper considers what would happen if one attempts to solve consensus using various objects but without any registers. We show that under a reasonable assumption, objects like queues and stacks cannot emulate the missing registers. We also show that, perhaps surprisingly, initialization, shown to have no computational consequences when registers are readily available, is crucial in determining the synchronization power of objects when no registers are allowed. Finally, we show that without registers, the number of available objects affects the level of consensus that can be solved.
Our work thus raises the question of whether consensus hierarchies which assume an unbounded number of registers truly capture synchronization power, and begins a line of research aimed at better understanding the interaction between read-write memory and the powerful synchronization operations available on modern architectures.
△ Less
Submitted 1 November, 2014;
originally announced November 2014.
-
Restricted Isometry Property for General p-Norms
Authors:
Zeyuan Allen-Zhu,
Rati Gelashvili,
Ilya Razenshteyn
Abstract:
The Restricted Isometry Property (RIP) is a fundamental property of a matrix which enables sparse recovery. Informally, an $m \times n$ matrix satisfies RIP of order $k$ for the $\ell_p$ norm, if $\|Ax\|_p \approx \|x\|_p$ for every vector $x$ with at most $k$ non-zero coordinates.
For every $1 \leq p < \infty$ we obtain almost tight bounds on the minimum number of rows $m$ necessary for the RIP…
▽ More
The Restricted Isometry Property (RIP) is a fundamental property of a matrix which enables sparse recovery. Informally, an $m \times n$ matrix satisfies RIP of order $k$ for the $\ell_p$ norm, if $\|Ax\|_p \approx \|x\|_p$ for every vector $x$ with at most $k$ non-zero coordinates.
For every $1 \leq p < \infty$ we obtain almost tight bounds on the minimum number of rows $m$ necessary for the RIP property to hold. Prior to this work, only the cases $p = 1$, $1 + 1 / \log k$, and $2$ were studied. Interestingly, our results show that the case $p = 2$ is a "singularity" point: the optimal number of rows $m$ is $\widetildeΘ(k^{p})$ for all $p\in [1,\infty)\setminus \{2\}$, as opposed to $\widetildeΘ(k)$ for $k=2$.
We also obtain almost tight bounds for the column sparsity of RIP matrices and discuss implications of our results for the Stable Sparse Recovery problem.
△ Less
Submitted 22 February, 2015; v1 submitted 8 July, 2014;
originally announced July 2014.