-
Online Payments by Merely Broadcasting Messages (Extended Version)
Authors:
Daniel Collins,
Rachid Guerraoui,
Jovan Komatovic,
Matteo Monti,
Athanasios Xygkis,
Matej Pavlovic,
Petr Kuznetsov,
Yvonne-Anne Pignolet,
Dragos-Adrian Seredinschi,
Andrei Tonkikh
Abstract:
We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast--…
▽ More
We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast---enabling a simpler and more efficient implementation than consensus-based payment systems.
In terms of efficiency, Astro executes a payment by merely broadcasting a message. The distinguishing feature of Astro is that it can maintain performance robustly, i.e., remain unaffected by a fraction of replicas being compromised or slowed down by an adversary. Our experiments on a public cloud network show that Astro can achieve near-linear scalability in a sharded setup, going from $10K$ payments/sec (2 shards) to $20K$ payments/sec (4 shards). In a nutshell, Astro can match VISA-level average payment throughput, and achieves a $5\times$ improvement over a state-of-the-art consensus-based solution, while exhibiting sub-second $95^{th}$ percentile latency.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Dynamic Byzantine Reliable Broadcast [Technical Report]
Authors:
Rachid Guerraoui,
Jovan Komatovic,
Petr Kuznetsov,
Yvonne-Anne Pignolet,
Dragos-Adrian Seredinschi,
Andrei Tonkikh
Abstract:
Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is power…
▽ More
Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is powerful enough to implement important applications like payment systems.
The problem we tackle in this paper is that of dynamic reliable broadcast, i.e., enabling processes to join or leave the system. This property is desirable for long-lived applications (aiming to be highly available), yet has been precluded in previous asynchronous reliable broadcast protocols. We study this property in a general adversarial (i.e., Byzantine) environment.
We introduce the first specification of a dynamic Byzantine reliable broadcast (DBRB) primitive that is amenable to an asynchronous implementation. We then present an algorithm implementing this specification in an asynchronous network. Our DBRB algorithm ensures that if any correct process in the system broadcasts a message, then every correct process delivers that message unless it leaves the system. Moreover, if a correct process delivers a message, then every correct process that has not expressed its will to leave the system delivers that message. We assume that more than $2/3$ of processes in the system are correct at all times, which is tight in our context.
We also show that if only one process in the system can fail---and it can fail only by crashing---then it is impossible to implement a stronger primitive, ensuring that if any correct process in the system broadcasts or delivers a message, then every correct process in the system delivers that message---including those that leave.
△ Less
Submitted 20 November, 2020; v1 submitted 17 January, 2020;
originally announced January 2020.
-
Can 100 Machines Agree?
Authors:
Rachid Guerraoui,
Jad Hamza,
Dragos-Adrian Seredinschi,
Marko Vukolic
Abstract:
Agreement protocols have been typically deployed at small scale, e.g., using three to five machines. This is because these protocols seem to suffer from a sharp performance decay. More specifically, as the size of a deployment---i.e., degree of replication---increases, the protocol performance greatly decreases. There is not much experimental evidence for this decay in practice, however, notably f…
▽ More
Agreement protocols have been typically deployed at small scale, e.g., using three to five machines. This is because these protocols seem to suffer from a sharp performance decay. More specifically, as the size of a deployment---i.e., degree of replication---increases, the protocol performance greatly decreases. There is not much experimental evidence for this decay in practice, however, notably for larger system sizes, e.g., beyond a handful of machines.
In this paper we execute agreement protocols on up to 100 machines and observe on their performance decay. We consider well-known agreement protocols part of mature systems, such as Apache ZooKeeper, etcd, and BFT-Smart, as well as a chain and a novel ring-based agreement protocol which we implement ourselves.
We provide empirical evidence that current agreement protocols execute gracefully on 100 machines. We observe that throughput decay is initially sharp (consistent with previous observations); but intriguingly---as each system grows beyond a few tens of replicas---the decay dampens. For chain- and ring-based replication, this decay is slower than for the other systems. The positive takeaway from our evaluation is that mature agreement protocol implementations can sustain out-of-the-box 300 to 500 requests per second when executing on 100 replicas on a wide-area public cloud platform. Chain- and ring-based replication can reach between 4K and 11K (up to 20x improvements) depending on the fault assumptions.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Scalable Byzantine Reliable Broadcast (Extended Version)
Authors:
Rachid Guerraoui,
Petr Kuznetsov,
Matteo Monti,
Matej Pavlovic,
Dragos-Adrian Seredinschi,
Yann Vonlanthen
Abstract:
Byzantine reliable broadcast is a powerful primitive that allows a set of processes to agree on a message from a designated sender, even if some processes (including the sender) are Byzantine. Existing broadcast protocols for this setting scale poorly, as they typically build on quorum systems with strong intersection guarantees, which results in linear per-process communication and computation co…
▽ More
Byzantine reliable broadcast is a powerful primitive that allows a set of processes to agree on a message from a designated sender, even if some processes (including the sender) are Byzantine. Existing broadcast protocols for this setting scale poorly, as they typically build on quorum systems with strong intersection guarantees, which results in linear per-process communication and computation complexity.
We generalize the Byzantine reliable broadcast abstraction to the probabilistic setting, allowing each of its properties to be violated with a fixed, arbitrarily small probability. We leverage these relaxed guarantees in a protocol where we replace quorums with stochastic samples. Compared to quorums, samples are significantly smaller in size, leading to a more scalable design. We obtain the first Byzantine reliable broadcast protocol with logarithmic per-process communication and computation complexity.
We conduct a complete and thorough analysis of our protocol, deriving bounds on the probability of each of its properties being compromised. During our analysis, we introduce a novel general technique we call adversary decorators. Adversary decorators allow us to make claims about the optimal strategy of the Byzantine adversary without having to make any additional assumptions. We also introduce Threshold Contagion, a model of message propagation through a system with Byzantine processes. To the best of our knowledge, this is the first formal analysis of a probabilistic broadcast protocol in the Byzantine fault model. We show numerically that practically negligible failure probabilities can be achieved with realistic security parameters.
△ Less
Submitted 19 February, 2020; v1 submitted 5 August, 2019;
originally announced August 2019.
-
The Consensus Number of a Cryptocurrency (Extended Version)
Authors:
Rachid Guerraoui,
Petr Kuznetsov,
Matteo Monti,
Matej Pavlovic,
Dragos-Adrian Seredinschi
Abstract:
Many blockchain-based algorithms, such as Bitcoin, implement a decentralized asset transfer system, often referred to as a cryptocurrency. As stated in the original paper by Nakamoto, at the heart of these systems lies the problem of preventing double-spending; this is usually solved by achieving consensus on the order of transfers among the participants. In this paper, we treat the asset transfer…
▽ More
Many blockchain-based algorithms, such as Bitcoin, implement a decentralized asset transfer system, often referred to as a cryptocurrency. As stated in the original paper by Nakamoto, at the heart of these systems lies the problem of preventing double-spending; this is usually solved by achieving consensus on the order of transfers among the participants. In this paper, we treat the asset transfer problem as a concurrent object and determine its consensus number, showing that consensus is, in fact, not necessary to prevent double-spending. We first consider the problem as defined by Nakamoto, where only a single process---the account owner---can withdraw from each account. Safety and liveness need to be ensured for correct account owners, whereas misbehaving account owners might be unable to perform transfers. We show that the consensus number of an asset transfer object is $1$. We then consider a more general $k$-shared asset transfer object where up to $k$ processes can atomically withdraw from the same account, and show that this object has consensus number $k$. We establish our results in the context of shared memory with benign faults, allowing us to properly understand the level of difficulty of the asset transfer problem. We also translate these results in the message passing setting with Byzantine players, a model that is more relevant in practice. In this model, we describe an asynchronous Byzantine fault-tolerant asset transfer implementation that is both simpler and more efficient than state-of-the-art consensus-based solutions. Our results are applicable to both the permissioned (private) and permissionless (public) setting, as normally their differentiation is hidden by the abstractions on top of which our algorithms are based.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
AT2: Asynchronous Trustworthy Transfers
Authors:
Rachid Guerraoui,
Petr Kuznetsov,
Matteo Monti,
Matej Pavlovic,
Dragos-Adrian Seredinschi
Abstract:
Many blockchain-based protocols, such as Bitcoin, implement a decentralized asset transfer (or exchange) system. As clearly stated in the original paper by Nakamoto, the crux of this problem lies in prohibiting any participant from engaging in double-spending. There seems to be a common belief that consensus is necessary for solving the double-spending problem. Indeed, whether it is for a permissi…
▽ More
Many blockchain-based protocols, such as Bitcoin, implement a decentralized asset transfer (or exchange) system. As clearly stated in the original paper by Nakamoto, the crux of this problem lies in prohibiting any participant from engaging in double-spending. There seems to be a common belief that consensus is necessary for solving the double-spending problem. Indeed, whether it is for a permissionless or a permissioned environment, the typical solution uses consensus to build a totally ordered ledger of submitted transfers. In this paper we show that this common belief is false: consensus is not needed to implement of a decentralized asset transfer system. We do so by introducing AT2 (Asynchronous Trustworthy Transfers), a class of consensusless algorithms. To show formally that consensus is unnecessary for asset transfers, we consider this problem first in the shared-memory context. We introduce AT2$_{SM}$, a wait-free algorithm that asynchronously implements asset transfer in the read-write shared-memory model. In other words, we show that the consensus number of an asset-transfer object is one. In the message passing model with Byzantine faults, we introduce a generic asynchronous algorithm called AT2$_{MP}$ and discuss two instantiations of this solution. First, AT2$_{D}$ ensures deterministic guarantees and consequently targets a small scale deployment (tens to hundreds of nodes), typically for a permissioned environment. Second, AT2$_{P}$ provides probabilistic guarantees and scales well to a very large system size (tens of thousands of nodes), ensuring logarithmic latency and communication complexity. Instead of consensus, we construct AT2$_{D}$ and AT2$_{P}$ on top of a broadcast primitive with causal ordering guarantees offering deterministic and probabilistic properties, respectively.
△ Less
Submitted 5 March, 2019; v1 submitted 27 December, 2018;
originally announced December 2018.
-
SBFT: a Scalable and Decentralized Trust Infrastructure
Authors:
Guy Golan Gueta,
Ittai Abraham,
Shelly Grossman,
Dahlia Malkhi,
Benny Pinkas,
Michael K. Reiter,
Dragos-Adrian Seredinschi,
Orr Tamir,
Alin Tomescu
Abstract:
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=…
▽ More
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=64 Byzantine failures. We provide experiments that show how the different algorithmic ingredients of \sysname increase its performance and scalability. The results show that SBFT simultaneously provides almost 2x better throughput and about 1.5x better latency relative to a highly optimized system that implements the PBFT protocol. To achieve this performance improvement, SBFT uses a combination of four ingredients: using collectors and threshold signatures to reduce communication to linear, using an optimistic fast path, reducing client communication and utilizing redundant servers for the fast path.
△ Less
Submitted 2 January, 2019; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Monotonic Prefix Consistency in Distributed Systems
Authors:
Alain Girault,
Gregor Gössler,
Rachid Guerraoui,
Jad Hamza,
Dragos-Adrian Seredinschi
Abstract:
We study the issue of data consistency in distributed systems. Specifically, we consider a distributed system that replicates its data at multiple sites, which is prone to partitions, and which is assumed to be available (in the sense that queries are always eventually answered). In such a setting, strong consistency, where all replicas of the system apply synchronously every operation, is not pos…
▽ More
We study the issue of data consistency in distributed systems. Specifically, we consider a distributed system that replicates its data at multiple sites, which is prone to partitions, and which is assumed to be available (in the sense that queries are always eventually answered). In such a setting, strong consistency, where all replicas of the system apply synchronously every operation, is not possible to implement. However, many weaker consistency criteria that allow a greater number of behaviors than strong consistency, are implementable in available distributed systems. We focus on determining the strongest consistency criterion that can be implemented in a convergent and available distributed system that tolerates partitions. We focus on objects where the set of operations can be split into updates and queries. We show that no criterion stronger than Monotonic Prefix Consistency (MPC) can be implemented.
△ Less
Submitted 21 July, 2018; v1 submitted 25 October, 2017;
originally announced October 2017.
-
Incremental Consistency Guarantees for Replicated Objects
Authors:
Rachid Guerraoui,
Matej Pavlovic,
Dragos-Adrian Seredinschi
Abstract:
Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with t…
▽ More
Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later.
We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.
△ Less
Submitted 8 September, 2016;
originally announced September 2016.