-
Byzantine-Tolerant Consensus in GPU-Inspired Shared Memory
Authors:
Chryssis Georgiou,
Manaswini Piduguralla,
Sathya Peri
Abstract:
In this work, we formalize a novel shared memory model inspired by the popular GPU architecture. Within this model, we develop algorithmic solutions to the Byzantine Consensus problem and analyze their fault-resilience.
In this work, we formalize a novel shared memory model inspired by the popular GPU architecture. Within this model, we develop algorithmic solutions to the Byzantine Consensus problem and analyze their fault-resilience.
△ Less
Submitted 16 June, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Outlier-Robust Training of Machine Learning Models
Authors:
Rajat Talak,
Charis Georgiou,
Jingnan Shi,
Luca Carlone
Abstract:
Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using…
▽ More
Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel $σ$ that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels σ increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Ares II: Tracing the Flaws of a (Storage) God
Authors:
Chryssis Georgiou,
Nicolas Nicolaou,
Andria Trigeorgi
Abstract:
Ares is a modular framework, designed to implement dynamic, reconfigurable, fault-tolerant, read/write and strongly consistent distributed shared memory objects. Recent enhancements of the framework have realized the efficient implementation of large objects, by introducing versioning and data striping techniques. In this work, we identify performance bottlenecks of the Ares's variants by utilizin…
▽ More
Ares is a modular framework, designed to implement dynamic, reconfigurable, fault-tolerant, read/write and strongly consistent distributed shared memory objects. Recent enhancements of the framework have realized the efficient implementation of large objects, by introducing versioning and data striping techniques. In this work, we identify performance bottlenecks of the Ares's variants by utilizing distributed tracing, a popular technique for monitoring and profiling distributed systems. We then propose optimizations across all versions of Ares, aiming in overcoming the identified flaws, while preserving correctness. We refer to the optimized version of Ares as Ares II, which now features a piggyback mechanism, a garbage collection mechanism, and a batching reconfiguration technique for improving the performance and storage efficiency of the original Ares. We rigorously prove the correctness of Ares II, and we demonstrate the performance improvements by an experimental comparison (via distributed tracing) of the Ares II variants with their original counterparts.
△ Less
Submitted 6 March, 2024;
originally announced July 2024.
-
AMECOS: A Modular Event-based Framework for Concurrent Object Specification
Authors:
Timothé Albouy,
Antonio Fernández Anta,
Chryssis Georgiou,
Mathieu Gestin,
Nicolas Nicolaou,
Junlang Wang
Abstract:
In this work, we introduce a modular framework for specifying distributed systems that we call AMECOS. Specifically, our framework departs from the traditional use of sequential specification, which presents limitations both on the specification expressiveness and implementation efficiency of inherently concurrent objects, as documented by Castañeda, Rajsbaum and Raynal in CACM 2023. Our framework…
▽ More
In this work, we introduce a modular framework for specifying distributed systems that we call AMECOS. Specifically, our framework departs from the traditional use of sequential specification, which presents limitations both on the specification expressiveness and implementation efficiency of inherently concurrent objects, as documented by Castañeda, Rajsbaum and Raynal in CACM 2023. Our framework focuses on the interactions between the various system components, specified as concurrent objects. Interactions are described with sequences of object events. This provides a modular way of specifying distributed systems and separates legality (object semantics) from other issues, such as consistency. We demonstrate the usability of our framework by (i) specifying various well-known concurrent objects, such as registers, shared memory, message-passing, reliable broadcast, and consensus, (ii) providing hierarchies of ordering semantics (namely, consistency hierarchy, memory hierarchy, and reliable broadcast hierarchy), and (iii) presenting a novel axiomatic proof of the impossibility of the well-known Consensus problem.
△ Less
Submitted 20 November, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
On HTLC-Based Protocols for Multi-Party Cross-Chain Swaps
Authors:
Emily Clark,
Chloe Georgiou,
Katelyn Poon,
Marek Chrobak
Abstract:
In his 2018 paper, Herlihy introduced an atomic protocol for multi-party asset swaps across different blockchains. His model represents an asset swap by a directed graph whose nodes are the participating parties and edges represent asset transfers, and rational behavior of the participants is captured by a preference relation between a protocol's outcomes. Asset transfers between parties are achie…
▽ More
In his 2018 paper, Herlihy introduced an atomic protocol for multi-party asset swaps across different blockchains. His model represents an asset swap by a directed graph whose nodes are the participating parties and edges represent asset transfers, and rational behavior of the participants is captured by a preference relation between a protocol's outcomes. Asset transfers between parties are achieved using smart contracts. These smart contracts are quite involved and they require storage and processing of a large number of paths in the swap digraph, limiting practical significance of his protocol. His paper also describes a different protocol that uses only standard hash time-lock contracts (HTLC's), but this simpler protocol applies only to some special types of digraphs. He left open the question whether there is a simple and efficient protocol for cross-chain asset swaps in arbitrary digraphs. Motivated by this open problem, we conducted a comprehensive study of \emph{HTLC-based protocols}, in which all asset transfers are implemented with HTLCs. Our main contribution is a full characterization of swap digraphs that have such protocols.
△ Less
Submitted 14 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
REACT: Two Datasets for Analyzing Both Human Reactions and Evaluative Feedback to Robots Over Time
Authors:
Kate Candon,
Nicholas C. Georgiou,
Helen Zhou,
Sidney Richardson,
Qiping Zhang,
Brian Scassellati,
Marynel Vázquez
Abstract:
Recent work in Human-Robot Interaction (HRI) has shown that robots can leverage implicit communicative signals from users to understand how they are being perceived during interactions. For example, these signals can be gaze patterns, facial expressions, or body motions that reflect internal human states. To facilitate future research in this direction, we contribute the REACT database, a collecti…
▽ More
Recent work in Human-Robot Interaction (HRI) has shown that robots can leverage implicit communicative signals from users to understand how they are being perceived during interactions. For example, these signals can be gaze patterns, facial expressions, or body motions that reflect internal human states. To facilitate future research in this direction, we contribute the REACT database, a collection of two datasets of human-robot interactions that display users' natural reactions to robots during a collaborative game and a photography scenario. Further, we analyze the datasets to show that interaction history is an important factor that can influence human reactions to robots. As a result, we believe that future models for interpreting implicit feedback in HRI should explicitly account for this history. REACT opens up doors to this possibility in the future.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Self-stabilizing Byzantine-tolerant Recycling
Authors:
Chryssis Georgiou,
Michel Raynal,
Elad M. Schiller
Abstract:
Numerous distributed applications, such as cloud computing and distributed ledgers, necessitate the system to invoke asynchronous consensus objects an unbounded number of times, where the completion of one consensus instance is followed by the invocation of another. With only a constant number of objects available, object reuse becomes vital.
We investigate the challenge of object recycling in t…
▽ More
Numerous distributed applications, such as cloud computing and distributed ledgers, necessitate the system to invoke asynchronous consensus objects an unbounded number of times, where the completion of one consensus instance is followed by the invocation of another. With only a constant number of objects available, object reuse becomes vital.
We investigate the challenge of object recycling in the presence of Byzantine processes, which can deviate from the algorithm code in any manner. Our solution must also be self-stabilizing, as it is a powerful notion of fault tolerance. Self-stabilizing systems can recover automatically after the occurrence of arbitrary transient faults, in addition to tolerating communication and (Byzantine or crash) process failures, provided the algorithm code remains intact.
We provide a recycling mechanism for asynchronous objects that enables their reuse once their task has ended, and all non-faulty processes have retrieved the decided values. This mechanism relies on synchrony assumptions and builds on a new self-stabilizing Byzantine-tolerant synchronous multivalued consensus algorithm, along with a novel composition of existing techniques.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Validated Objects: Specification, Implementation, and Applications
Authors:
Antonio Fernández Anta,
Chryssis Georgiou,
Nicolas Nicolaou,
Antonio Russo
Abstract:
Guaranteeing the validity of concurrent operations on distributed objects is a key property for ensuring reliability and consistency in distributed systems. Usually, the methods for validating these operations, if present, are wired in the object implementation. In this work, we formalize the notion of a {\em validated object}, decoupling the object operations and properties from the validation pr…
▽ More
Guaranteeing the validity of concurrent operations on distributed objects is a key property for ensuring reliability and consistency in distributed systems. Usually, the methods for validating these operations, if present, are wired in the object implementation. In this work, we formalize the notion of a {\em validated object}, decoupling the object operations and properties from the validation procedure. We consider two types of objects, satisfying different levels of consistency: the validated {\em totally-ordered} object, offering a total ordering of its operations, and its weaker variant, the validated {\em regular} object. We provide conditions under which it is possible to implement these objects. In particular, we show that crash-tolerant implementations of validated regular objects are always possible in an asynchronous system with a majority of correct processes. However, for validated totally-ordered objects, consensus is always required if a property of the object we introduce in this work, {\em persistent validity,} does not hold. Persistent validity combined with another new property, {\em persistent execution}, allows consensus-free crash-tolerant implementations of validated totally-ordered objects. We demonstrate the utility of validated objects by considering several applications conforming to our formalism.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Fragmented ARES: Dynamic Storage for Large Objects
Authors:
Chryssis Georgiou,
Nicolas Nicolaou,
Andria Trigeorgi
Abstract:
Data availability is one of the most important features in distributed storage systems, made possible by data replication. Nowadays data are generated rapidly and the goal to develop efficient, scalable and reliable storage systems has become one of the major challenges for high performance computing. In this work, we develop a dynamic, robust and strongly consistent distributed storage implementa…
▽ More
Data availability is one of the most important features in distributed storage systems, made possible by data replication. Nowadays data are generated rapidly and the goal to develop efficient, scalable and reliable storage systems has become one of the major challenges for high performance computing. In this work, we develop a dynamic, robust and strongly consistent distributed storage implementation suitable for handling large objects (such as files). We do so by integrating an Adaptive, Reconfigurable, Atomic Storage framework, called ARES, with a distributed file system, called COBFS, which relies on a block fragmentation technique to handle large objects. With the addition of ARES, we also enable the use of an erasure-coded algorithm to further split our data and to potentially improve storage efficiency at the replica servers and operation latency. To put the practicality of our outcomes at test, we conduct an in-depth experimental evaluation on the Emulab and AWS EC2 testbeds, illustrating the benefits of our approaches, as well as other interesting tradeoffs.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
Estimating Active Cases of COVID-19
Authors:
Javier Álvarez,
Carlos Baquero,
Elisa Cabana,
Jaya Prakash Champati,
Antonio Fernández Anta,
Davide Frey,
Augusto García-Agúndez,
Chryssis Georgiou,
Mathieu Goessens,
Harold Hernández,
Rosa Lillo,
Raquel Menezes,
Raúl Moreno,
Nicolas Nicolaou,
Oluwasegun Ojo,
Antonio Ortega,
Jesús Rufino,
Efstathios Stavrakis,
Govind Jeevan,
Christin Glorioso
Abstract:
Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that t…
▽ More
Having accurate and timely data on confirmed active COVID-19 cases is challenging, since it depends on testing capacity and the availability of an appropriate infrastructure to perform tests and aggregate their results. In this paper, we propose methods to estimate the number of active cases of COVID-19 from the official data (of confirmed cases and fatalities) and from survey data. We show that the latter is a viable option in countries with reduced testing capacity or suboptimal infrastructures.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Loosely-self-stabilizing Byzantine-tolerant Binary Consensus for Signature-free Message-passing Systems
Authors:
Chryssis Georgiou,
Ioannis Marcoullis,
Michel Raynal,
Elad Michael Schiller
Abstract:
At PODC 2014, A. Mostéfaoui, H. Moumen, and M. Raynal presented a new and simple randomized signature-free binary consensus algorithm (denoted here MMR) that copes with the net effect of asynchrony Byzantine behaviors. Assuming message scheduling is fair and independent from random numbers MMR is optimal in several respects: it deals with up to t Byzantine processes where t < n/3 and n is the numb…
▽ More
At PODC 2014, A. Mostéfaoui, H. Moumen, and M. Raynal presented a new and simple randomized signature-free binary consensus algorithm (denoted here MMR) that copes with the net effect of asynchrony Byzantine behaviors. Assuming message scheduling is fair and independent from random numbers MMR is optimal in several respects: it deals with up to t Byzantine processes where t < n/3 and n is the number of processes, O(n\^2) messages and O(1) expected time. The present article presents a non-trivial extension of MMR to an even more fault-prone context, namely, in addition to Byzantine processes, it considers also that the system can experience transient failures. To this end it considers self-stabilization techniques to cope with communication failures and arbitrary transient faults (such faults represent any violation of the assumptions according to which the system was designed to operate).
The proposed algorithm is the first loosely-self-stabilizing Byzantine fault-tolerant binary consensus algorithm suited to asynchronous message-passing systems. This is achieved via an instructive transformation of MMR to a self-stabilizing solution that can violate safety requirements with probability Pr= O(1/(2M)), where M is a predefined constant that can be set to any positive integer at the cost of 3 M n + log M bits of local memory. In addition to making MMR resilient to transient faults, the obtained self-stabilizing algorithm preserves its properties of optimal resilience and termination, (i.e., t < n/3, and O(1) expected time). Furthermore, it only requires a bounded amount of memory.
△ Less
Submitted 21 January, 2023; v1 submitted 26 March, 2021;
originally announced March 2021.
-
Byzantine-tolerant Distributed Grow-only Sets: Specification and Applications
Authors:
Vicent Cholvi,
Antonio Fernández Anta,
Chryssis Georgiou,
Nicolas Nicolaou,
Michel Raynal,
Antonio Russo
Abstract:
In order to formalize Distributed Ledger Technologies and their interconnections, a recent line of research work has formulated the notion of Distributed Ledger Object (DLO), which is a concurrent object that maintains a totally ordered sequence of records, abstracting blockchains and distributed ledgers. Through DLO, the Atomic Appends problem, intended as the need of a primitive able to append m…
▽ More
In order to formalize Distributed Ledger Technologies and their interconnections, a recent line of research work has formulated the notion of Distributed Ledger Object (DLO), which is a concurrent object that maintains a totally ordered sequence of records, abstracting blockchains and distributed ledgers. Through DLO, the Atomic Appends problem, intended as the need of a primitive able to append multiple records to distinct ledgers in an atomic way, is studied as a basic interconnection problem among ledgers.
In this work, we propose the Distributed Grow-only Set object (DSO), which instead of maintaining a sequence of records, as in a DLO, maintains a set of records in an immutable way: only Add and Get operations are provided. This object is inspired by the Grow-only Set (G-Set) data type which is part of the Conflict-free Replicated Data Types. We formally specify the object and we provide a consensus-free Byzantine-tolerant implementation that guarantees eventual consistency. We then use our Byzantine-tolerant DSO (BDSO) implementation to provide consensus-free algorithmic solutions to the Atomic Appends and Atomic Adds (the analogous problem of atomic appends applied on G-Sets) problems, as well as to construct consensus-free Single-Writer BDLOs. We believe that the BDSO has applications beyond the above-mentioned problems.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Fragmented Objects: Boosting Concurrency of Shared Large Objects
Authors:
Antonio Fernandez Anta,
Chryssis Georgiou,
Theophanis Hadjistasi,
Nicolas Nicolaou,
Efstathios Stavrakis,
Andria Trigeorgi
Abstract:
This work examines strategies to handle large shared data objects in distributed storage systems (DSS), while boosting the number of concurrent accesses, maintaining strong consistency guarantees, and ensuring good operation performance. To this respect, we define the notion of fragmented objects:con-current objects composed of a list of fragments (or blocks) that allow operations to manipulate ea…
▽ More
This work examines strategies to handle large shared data objects in distributed storage systems (DSS), while boosting the number of concurrent accesses, maintaining strong consistency guarantees, and ensuring good operation performance. To this respect, we define the notion of fragmented objects:con-current objects composed of a list of fragments (or blocks) that allow operations to manipulate each of their fragments individually. As the fragments belong to the same object, it is not enough that each fragment is linearizable to have useful consistency guarantees in the composed object. Hence, we capture the consistency semantic of the whole object with the notion of fragmented linearizability. Then, considering that a variance of linearizability, coverability, is more suited for versioned objects like files, we provide an implementation of a distributed file system, called COBFS, that utilizes coverable fragmented objects (i.e., files).In COBFS, each file is a linked-list of coverable block objects. Preliminary emulation of COBFS demonstrates the potential of our approach in boosting the concurrency of strongly consistent large objects.
△ Less
Submitted 7 March, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
A Self-stabilizing Control Plane for the Edge and Fog Ecosystems
Authors:
Zacharias Georgiou,
Chryssis Georgiou,
George Pallis,
Elad Michael Schiller,
Demetris Trihinas
Abstract:
Fog Computing is now emerging as the dominating paradigm bridging the compute and connectivity gap between sensing devices (a.k.a. "things") and latency-sensitive services. However, as fog deployments scale by accumulating numerous devices interconnected over highly dynamic and volatile network fabrics, the need for self-configuration and self-healing in the presence of failures is more evident no…
▽ More
Fog Computing is now emerging as the dominating paradigm bridging the compute and connectivity gap between sensing devices (a.k.a. "things") and latency-sensitive services. However, as fog deployments scale by accumulating numerous devices interconnected over highly dynamic and volatile network fabrics, the need for self-configuration and self-healing in the presence of failures is more evident now than ever. Using the prevailing methodology of self-stabilization, we propose a fault-tolerant framework for distributed control planes that enables fog services to cope and recover from a very broad fault model. Specifically, our model considers network uncertainties, packet drops, node fail-stop failures, and violations of the assumptions according to which the system was designed to operate, such as an arbitrary corruption of the system state. Our self-stabilizing algorithms guarantee automatic recovery within a constant number of communication rounds without the need for external (human) intervention. To showcase the framework's effectiveness, the correctness proof of the proposed self-stabilizing algorithmic process is accompanied by a comprehensive evaluation featuring an open and reproducible testbed utilizing real-world data from the intelligent transportation domain. Results show that our framework ensures a fog ecosystem recovery from faults in constant time, analytics are computed correctly, while the overhead to the system's control plane scales linearly towards the IoT load.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
(In)Existence of Equilibria for 2-Players, 2-Values Games with Concave Valuations
Authors:
Chryssis Georgiou,
Marios Mavronicolas,
Burkhard Monien
Abstract:
We consider 2-players, 2-values minimization games where the players' costs take on two values, $a,b$, $a<b$. The players play mixed strategies and their costs are evaluated by unimodal valuations. This broad class of valuations includes all concave, one-parameter functions $\mathsf{F}: [0,1]\rightarrow \mathbb{R}$ with a unique maximum point. Our main result is an impossibility result stating tha…
▽ More
We consider 2-players, 2-values minimization games where the players' costs take on two values, $a,b$, $a<b$. The players play mixed strategies and their costs are evaluated by unimodal valuations. This broad class of valuations includes all concave, one-parameter functions $\mathsf{F}: [0,1]\rightarrow \mathbb{R}$ with a unique maximum point. Our main result is an impossibility result stating that: If the maximum is obtained in $(0,1)$ and $\mathsf{F}\left(\frac{1}{2}\right)\ne b$, then there exists a 2-players, 2-values game without $\mathsf{F}$-equilibrium.
The counterexample game used for the impossibility result belongs to a new class of very sparse 2-players, 2-values bimatrix games which we call normal games. In an attempt to investigate the remaining case $\mathsf{F}\left(\frac{1}{2}\right) = b$, we show that:
- Every normal, $n$-strategies game has an ${\mathsf{F}}$-equilibrium when ${\mathsf{F}}\left( \frac{1}{2} \right) = b$. We present a linear time algorithm for computing such an equilibrium.
- For 2-players, 2-values games with 3 strategies we have that if $\mathsf{F}\left(\frac{1}{2}\right) \le b$, then every 2-players, 2-values, 3-strategies game has an $\mathsf{F}$-equilibrium; if $\mathsf{F}\left(\frac{1}{2}\right) > b$, then there exists a normal 2-players, 2-values, 3-strategies game without $\mathsf{F}$-equilibrium.
To the best of our knowledge, this work is the first to provide an (almost complete) answer on whether there is, for a given concave function $\mathsf{F}$, a counterexample game without $\mathsf{F}$-equilibrium.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics
Authors:
Oluwasegun Ojo,
Augusto García-Agundez,
Benjamin Girault,
Harold Hernández,
Elisa Cabana,
Amanda García-García,
Payman Arabshahi,
Carlos Baquero,
Paolo Casari,
Ednaldo José Ferreira,
Davide Frey,
Chryssis Georgiou,
Mathieu Goessens,
Anna Ishchenko,
Ernesto Jiménez,
Oleksiy Kebkal,
Rosa Lillo,
Raquel Menezes,
Nicolas Nicolaou,
Antonio Ortega,
Paul Patras,
Julian C Roberts,
Efstathios Stavrakis,
Yuichi Tanaka,
Antonio Fernández Anta
Abstract:
The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic…
▽ More
The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic in a given country with a reasonable level of accuracy is useful. In this paper, we propose a technique based on (anonymous) surveys in which participants report on the health status of their contacts. This indirect reporting technique, known in the literature as network scale-up method, preserves the privacy of the participants and their contacts, and collects information from a larger fraction of the population (as compared to individual surveys). This technique has been deployed in the CoronaSurveys project, which has been collecting reports for the COVID-19 pandemic for more than two months. Results obtained by CoronaSurveys show the power and flexibility of the approach, suggesting that it could be an inexpensive and powerful tool for LMICs.
△ Less
Submitted 26 June, 2020; v1 submitted 24 May, 2020;
originally announced May 2020.
-
Appending Atomically in Byzantine Distributed Ledgers
Authors:
Vicent Cholvi,
Antonio Fernandez Anta,
Chryssis Georgiou,
Nicolas Nicolaou,
Michel Raynal
Abstract:
A Distributed Ledger Object (DLO) is a concurrent object that maintains a totally ordered sequence of records, and supports two basic operations: append, which appends a record at the end of the sequence, and get, which returns the sequence of records. In this work we provide a proper formalization of a Byzantine-tolerant Distributed Ledger Object (BDLO), which is a DLO in a distributed system in…
▽ More
A Distributed Ledger Object (DLO) is a concurrent object that maintains a totally ordered sequence of records, and supports two basic operations: append, which appends a record at the end of the sequence, and get, which returns the sequence of records. In this work we provide a proper formalization of a Byzantine-tolerant Distributed Ledger Object (BDLO), which is a DLO in a distributed system in which processes may deviate arbitrarily from their indented behavior, i.e. they may be Byzantine. Our formal definition is accompanied by algorithms to implement BDLOs by utilizing an underlying Byzantine Atomic Broadcast service.
We then utilize the BDLO implementations to solve the Atomic Appends problem against Byzantine processes. The Atomic Appends problem emerges when several clients have records to append, the record of each client has to be appended to a different BDLO, and it must be guaranteed that either all records are appended or none. We present distributed algorithms implementing solutions for the Atomic Appends problem when the clients (which are involved in the appends) and the servers (which maintain the BDLOs) may be Byzantine.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Self-Stabilizing Snapshot Objects for Asynchronous Fail-Prone Network Systems
Authors:
Chryssis Georgiou,
Oskar Lundström,
Elad Michael Schiller
Abstract:
A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone message-passing systems. Their first algorithm is \emph{non-blocking}; it allows snapshot operations to terminate once all write operations have ceased. It uses…
▽ More
A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone message-passing systems. Their first algorithm is \emph{non-blocking}; it allows snapshot operations to terminate once all write operations have ceased. It uses $O(n)$ messages of $O(n ν)$ bits, where $n$ is the number of nodes and $ν$ is the number of bits it takes to represent the object. Their second algorithm allows snapshot operations to always terminate independently of write operations. It incurs $O(n^2)$ messages.
The fault model of Delporte-Gallet et al. considers node crashes. We aim at the design of even more robust snapshot objects via the lenses of self-stabilization---a very strong notion of fault-tolerance. In addition to Delporte-Gallet et al.'s fault model, our self-stabilizing algorithm can recover after the occurrence of transient faults; these faults represent arbitrary violations of the assumptions according to which the system was designed to operate.
We propose self-stabilizing variations of Delporte-Gallet et al.'s non-blocking algorithm and always-terminating algorithm. Our algorithms have similar communication costs to the ones by Delporte-Gallet et al. and $O(1)$ recovery time from transient faults. The main differences are that our proposal considers repeated gossiping of $O(ν)$ bit messages and deals with bounded space. We also consider an input parameter, $δ$, for which we claim an ability to balance the costs of snapshot operations. We validate our correctness proof, evaluate the performance of Delporte-Gallet et al.'s algorithms and our proposed variations and investigate the properties of $δ$ via PlanetLab experiments, where significant latency and communication costs reduction are observed.
△ Less
Submitted 28 February, 2020; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Utilizing Mobile Nodes for Congestion Control in Wireless Sensor Networks
Authors:
Antonia Nicolaou,
Natalie Temene,
Charalampos Sergiou,
Chryssis Georgiou,
Vasos Vassiliou
Abstract:
Congestion control and avoidance in Wireless Sensor Networks (WSNs) is a subject that has attracted a lot of research attention in the last decade. Besides rate and resource control, the utilization of mobile nodes has also been suggested as a way to control congestion. In this work, we present a Mobile Congestion Control (MobileCC) algorithm with two variations, to assist existing congestion cont…
▽ More
Congestion control and avoidance in Wireless Sensor Networks (WSNs) is a subject that has attracted a lot of research attention in the last decade. Besides rate and resource control, the utilization of mobile nodes has also been suggested as a way to control congestion. In this work, we present a Mobile Congestion Control (MobileCC) algorithm with two variations, to assist existing congestion control algorithms in facing congestion in WSNs. The first variation employs mobile nodes that create locally-significant alternative paths leading to the sink. The second variation employs mobile nodes that create completely individual (disjoint) paths to the sink. Simulation results show that both variations can significantly contribute to the alleviation of congestion in WSNs.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Atomic Appends: Selling Cars and Coordinating Armies with Multiple Distributed Ledgers
Authors:
Antonio Fernandez Anta,
Chryssis Georgiou,
Nicolas Nicolaou
Abstract:
The various applications using Distributed Ledger Technologies (DLT) or blockchains, have led to the introduction of a new `marketplace' where multiple types of digital assets may be exchanged. As each blockchain is designed to support specific types of assets and transactions, and no blockchain will prevail, the need to perform interblockchain transactions is already pressing.
In this work we e…
▽ More
The various applications using Distributed Ledger Technologies (DLT) or blockchains, have led to the introduction of a new `marketplace' where multiple types of digital assets may be exchanged. As each blockchain is designed to support specific types of assets and transactions, and no blockchain will prevail, the need to perform interblockchain transactions is already pressing.
In this work we examine the fundamental problem of interoperable and interconnected blockchains. In particular, we begin by introducing the Multi-Distributed Ledger Objects (MDLO), which is the result of aggregating multiple Distributed Ledger Objects -- DLO (a DLO is a formalization of the blockchain) and that supports append and get operations of records (e.g., transactions) in them from multiple clients concurrently. Next, we define the AtomicAppends problem, which emerges when the exchange of digital assets between multiple clients may involve appending records in more than one DLO. Specifically, AtomicAppend requires that either all records will be appended on the involved DLOs or none. We examine the solvability of this problem assuming rational and risk-averse clients that may fail by crashing, and under different client utility and append models, timing models, and client failure scenarios. We show that for some cases the existence of an intermediary is necessary for the problem solution. We propose the implementation of such intermediary over a specialized blockchain, we term Smart DLO (SDLO), and we show how this can be used to solve the AtomicAppends problem even in an asynchronous, client competitive environment, where all the clients may crash.
△ Less
Submitted 20 December, 2018;
originally announced December 2018.
-
Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage
Authors:
Chryssis Georgiou,
Robert Gustafsson,
Andreas Lindhe,
Elad M. Schiller
Abstract:
Shared memory emulation can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Attiya, Bar-Noy, and Dolev were the first to propose a single-writer, multi-reader linearizable register emulation where the register is replicated to all servers. Recently, Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which use…
▽ More
Shared memory emulation can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Attiya, Bar-Noy, and Dolev were the first to propose a single-writer, multi-reader linearizable register emulation where the register is replicated to all servers. Recently, Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding for achieving data redundancy with much lower communication cost than previous algorithmic solutions.
Although CAS can tolerate server crashes, it was not designed to recover from unexpected, transient faults, without the need of external (human) intervention. In this respect, Dolev, Petig, and Schiller have recently developed a self-stabilizing version of CAS, which we call CASSS. As one would expect, self-stabilization does not come as a free lunch; it introduces, mainly, communication overhead for detecting inconsistencies and stale information. So, one would wonder whether the overhead introduced by self-stabilization would nullify the gain of erasure coding.
To answer this question, we have implemented and experimentally evaluated the CASSS algorithm on PlanetLab; a planetary scale distributed infrastructure. The evaluation shows that our implementation of CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm (which we also implement) and (b) the recovery period (after the last occurrence of a transient fault) is as fast as a few client (read/write) operations. Our results suggest that CASSS does not significantly impact efficiency while dealing with automatic recovery from transient faults and bounded size of needed resources.
△ Less
Submitted 26 July, 2018; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Unleashing and Speeding Up Readers in Atomic Object Implementations
Authors:
Chryssis Georgiou,
Theophanis Hadjistasi,
Nicolas Nicolaou,
Alexander A. Schwarzmann
Abstract:
Providing efficient emulations of atomic read/write objects in asynchronous, crash-prone, message-passing systems is an important problem in distributed computing. Communication latency is a factor that typically dominates the performance of message-passing systems, consequently the efficiency of algorithms implementing atomic objects is measured in terms of the number of communication exchanges i…
▽ More
Providing efficient emulations of atomic read/write objects in asynchronous, crash-prone, message-passing systems is an important problem in distributed computing. Communication latency is a factor that typically dominates the performance of message-passing systems, consequently the efficiency of algorithms implementing atomic objects is measured in terms of the number of communication exchanges involved in each read and write operation. The seminal result of Attiya, Bar-Noy, and Dolev established that two pairs of communication exchanges, or equivalently two round-trip communications, are sufficient. Subsequent research examined the possibility of implementations that involve less than four exchanges. The work of Dutta et al. showed that for single-writer/multiple-reader (SWMR) settings two exchanges are sufficient, provided that the number of readers is severely constrained with respect to the number of object replicas in the system and the number of replica failures, and also showed that no two exchange implementations of multiple-writer/multiple-reader (MWMR) objects are possible. Later research focused on providing implementations that remove the constraint on the number of readers, while having read and write operations that use variable number of communication exchanges, specifically two, three, or four exchanges.
This work presents two advances in the state-of-the-art in this area. Specifically, for SWMR and MWMR systems algorithms are given in which read operations take two or three exchanges. This improves on prior works where read operations took either (a) three exchanges, or (b) two or four exchanges. The number of readers in the new algorithms is unconstrained, and write operations take the same number of exchanges as in prior work (two for SWMR and four for MWMR settings). The correctness of algorithms is rigorously argued.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
Formalizing and Implementing Distributed Ledger Objects
Authors:
Antonio Fernández Anta,
Chryssis Georgiou,
Kishori Konwar,
Nicolas Nicolaou
Abstract:
Despite the hype about blockchains and distributed ledgers, no formal abstraction of these objects has been proposed. To face this issue, in this paper we provide a proper formulation of a distributed ledger object. In brief, we define a ledger object as a sequence of records, and we provide the operations and the properties that such an object should support. Implementation of a ledger object on…
▽ More
Despite the hype about blockchains and distributed ledgers, no formal abstraction of these objects has been proposed. To face this issue, in this paper we provide a proper formulation of a distributed ledger object. In brief, we define a ledger object as a sequence of records, and we provide the operations and the properties that such an object should support. Implementation of a ledger object on top of multiple (possibly geographically dispersed) computing devices gives rise to the distributed ledger object. In contrast to the centralized object, distribution allows operations to be applied concurrently on the ledger, introducing challenges on the consistency of the ledger in each participant. We provide the definitions of three well known consistency guarantees in terms of the operations supported by the ledger object: (1) atomic consistency (linearizability), (2) sequential consistency, and (3) eventual consistency. We then provide implementations of distributed ledgers on asynchronous message passing crash-prone systems using an Atomic Broadcast service, and show that they provide eventual, sequential or atomic consistency semantics. We conclude with a variation of the ledger - the validated ledger - which requires that each record in the ledger satisfies a particular validation rule.
△ Less
Submitted 4 May, 2018; v1 submitted 21 February, 2018;
originally announced February 2018.
-
Self-stabilizing Reconfiguration
Authors:
Shlomi Dolev,
Chryssis Georgiou,
Ioannis Marcoullis,
Elad M. Schiller
Abstract:
Current reconfiguration techniques are based on starting the system in a consistent configuration, in which all participating entities are in their initial state. Starting from that state, the system must preserve consistency as long as a predefined churn rate of processors joins and leaves is not violated, and unbounded storage is available. Many working systems cannot control this churn rate and…
▽ More
Current reconfiguration techniques are based on starting the system in a consistent configuration, in which all participating entities are in their initial state. Starting from that state, the system must preserve consistency as long as a predefined churn rate of processors joins and leaves is not violated, and unbounded storage is available. Many working systems cannot control this churn rate and do not have access to unbounded storage. System designers that neglect the outcome of violating the above assumptions may doom the system to exhibit illegal behaviors. We present the first automatically recovering reconfiguration scheme that recovers from transient faults, such as temporal violations of the above assumptions. Our self-stabilizing solutions regain safety automatically by assuming temporal access to reliable failure detectors. Once safety is re-established, the failure detector reliability is no longer needed. Still, liveness is conditioned by the failure detector's unreliable signals. We show that our self-stabilizing reconfiguration techniques can serve as the basis for the implementation of several dynamic services over message passing systems. Examples include self-stabilizing reconfigurable virtual synchrony, which, in turn, can be used for implementing a self-stabilizing reconfigurable state-machine replication and self-stabilizing reconfigurable emulation of shared memory.
△ Less
Submitted 6 December, 2016; v1 submitted 1 June, 2016;
originally announced June 2016.
-
Performance of the finite volume method in solving regularised Bingham flows: inertia effects in the lid-driven cavity flow
Authors:
Alexandros Syrakos,
Georgios C. Georgiou,
Andreas N. Alexandrou
Abstract:
We extend our recent work on the creeping flow of a Bingham fluid in a lid-driven cavity, to the study of inertial effects, using a finite volume method and the Papanastasiou regularisation of the Bingham constitutive model [J. Rheology 31 (1987) 385-404]. The finite volume method used belongs to a very popular class of methods for solving Newtonian flow problems, which use the SIMPLE algorithm to…
▽ More
We extend our recent work on the creeping flow of a Bingham fluid in a lid-driven cavity, to the study of inertial effects, using a finite volume method and the Papanastasiou regularisation of the Bingham constitutive model [J. Rheology 31 (1987) 385-404]. The finite volume method used belongs to a very popular class of methods for solving Newtonian flow problems, which use the SIMPLE algorithm to solve the discretised set of equations, and have matured over the years. By regularising the Bingham constitutive equation it is easy to extend such a solver to Bingham flows since all that this requires is to modify the viscosity function. This is a tempting approach, since it requires minimum programming effort and makes available all the existing features of the mature finite volume solver. On the other hand, regularisation introduces a parameter which controls the error in addition to the grid spacing, and makes it difficult to locate the yield surfaces. Furthermore, the equations become stiffer and more difficult to solve, while the discontinuity at the yield surfaces causes large truncation errors. The present work attempts to investigate the strengths and weaknesses of such a method by applying it to the lid-driven cavity problem for a range of Bingham and Reynolds numbers (up to 100 and 5000 respectively). By employing techniques such as multigrid, local grid refinement, and an extrapolation procedure to reduce the effect of the regularisation parameter on the calculation of the yield surfaces (Liu et al. J. Non-Newtonian Fluid Mech. 102 (2002) 179-191), satisfactory results are obtained, although the weaknesses of the method become more noticeable as the Bingham number is increased.
△ Less
Submitted 2 May, 2016;
originally announced May 2016.
-
Solution of the square lid-driven cavity flow of a Bingham plastic using the finite volume method
Authors:
Alexandros Syrakos,
Georgios C. Georgiou,
Andreas N. Alexandrou
Abstract:
We investigate the performance of the finite volume method in solving viscoplastic flows. The creeping square lid-driven cavity flow of a Bingham plastic is chosen as the test case and the constitutive equation is regularised as proposed by Papanastasiou [J. Rheol. 31 (1987) 385-404]. It is shown that the convergence rate of the standard SIMPLE pressure-correction algorithm, which is used to solve…
▽ More
We investigate the performance of the finite volume method in solving viscoplastic flows. The creeping square lid-driven cavity flow of a Bingham plastic is chosen as the test case and the constitutive equation is regularised as proposed by Papanastasiou [J. Rheol. 31 (1987) 385-404]. It is shown that the convergence rate of the standard SIMPLE pressure-correction algorithm, which is used to solve the algebraic equation system that is produced by the finite volume discretisation, severely deteriorates as the Bingham number increases, with a corresponding increase in the non-linearity of the equations. It is shown that using the SIMPLE algorithm in a multigrid context dramatically improves convergence, although the multigrid convergence rates are much worse than for Newtonian flows. The numerical results obtained for Bingham numbers as high as 1000 compare favourably with reported results of other methods.
△ Less
Submitted 28 April, 2016;
originally announced April 2016.
-
Internet Computing: Using Reputation to Select Workers from a Pool
Authors:
Evgenia Christoforou,
Antonio Fernández Anta,
Chryssis Georgiou,
Miguel A. Mosteiro
Abstract:
The assignment and execution of tasks over the Internet is an inexpensive solution in contrast with supercomputers. We consider an Internet-based Master-Worker task computing approach, such as SETI@home. A master process sends tasks, across the Internet, to worker processors. Workers execute, and report back a result. Unfortunately, the disadvantage of this approach is the unreliable nature of the…
▽ More
The assignment and execution of tasks over the Internet is an inexpensive solution in contrast with supercomputers. We consider an Internet-based Master-Worker task computing approach, such as SETI@home. A master process sends tasks, across the Internet, to worker processors. Workers execute, and report back a result. Unfortunately, the disadvantage of this approach is the unreliable nature of the worker processes. Through different studies, workers have been categorized as either malicious (always report an incorrect result), altruistic (always report a correct result), or rational (report whatever result maximizes their benefit). We develop a reputation-based mechanism that guarantees that, eventually, the master will always be receiving the correct task result. We model the behavior of the rational workers through reinforcement learning, and we present three different reputation types to choose, for each computational round, the most reputable from a pool of workers. As workers are not always available, we enhance our reputation scheme to select the most responsive workers. We prove sufficient conditions for eventual correctness under the different reputation types. Our analysis is complemented by simulations exploring various scenarios. Our simulation results expose interesting trade-offs among the different reputation types, workers availability, and cost.
△ Less
Submitted 30 March, 2016; v1 submitted 14 March, 2016;
originally announced March 2016.
-
CoVer-ability: Consistent Versioning for Concurrent Objects
Authors:
Nicolas Nicolaou,
Antonio Fernández Anta,
Chryssis Georgiou
Abstract:
An object type characterizes the domain space and the operations that can be invoked on an object of that type. In this paper we introduce a new property for concurrent objects, we call coverability, that aims to provide precise guarantees on the consistent evolution of an object. This new property is suitable for a variety of distributed objects including concurrent file objects that demand opera…
▽ More
An object type characterizes the domain space and the operations that can be invoked on an object of that type. In this paper we introduce a new property for concurrent objects, we call coverability, that aims to provide precise guarantees on the consistent evolution of an object. This new property is suitable for a variety of distributed objects including concurrent file objects that demand operations to manipulate the latest version of the object. We propose two levels of coverability: (i) strong coverability and (ii) weak coverability. Strong coverability requires that only a single operation can modify the latest version of the object, i.e. "covers" the latest version with a new version, imposing a total order on object modifications. Weak coverability relaxes the strong requirements of strong coverability and allows multiple operations to modify the same version of an object, where each modification leads to a different version. Weak coverability preserves consistent evolution of the object, by demanding any subsequent operation to only modify one of the newly introduced versions. Coverability combined with atomic guarantees yield to coverable atomic read/write registers. We also show that strongly coverable atomic registers are equivalent in power to consensus. Thus, we focus on weakly coverable registers, and we demonstrate their importance by showing that they cannot be implemented using similar types of registers, like ranked-registers. Furthermore we show that weakly coverable registers may be used to implement basic (weak) read-modify-write and file objects. Finally, we implement weakly coverable registers by modifying an existing MWMR atomic register implementation.
△ Less
Submitted 11 March, 2016; v1 submitted 27 January, 2016;
originally announced January 2016.
-
Multi-round Master-Worker Computing: a Repeated Game Approach
Authors:
Antonio Fernández Anta,
Chryssis Georgiou,
Miguel A. Mosteiro,
Daniel Pareja
Abstract:
We consider a computing system where a master processor assigns tasks for execution to worker processors through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that workers are rational in a game-theoretic sense, and that they r…
▽ More
We consider a computing system where a master processor assigns tasks for execution to worker processors through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that workers are rational in a game-theoretic sense, and that they randomize their strategic choice. Workers are assigned multiple tasks in subsequent rounds. We model the system as an infinitely repeated game of the mixed extension of the strategic game. In each round, the master decides stochastically whether to accept the answer of the majority or verify the answers received, at some cost. Incentives and/or penalties are applied to workers accordingly. Under the above framework, we study the conditions in which the master can reliably obtain tasks results, exploiting that the repeated games model captures the effect of long-term interaction. That is, workers take into account that their behavior in one computation will have an effect on the behavior of other workers in the future. Indeed, should a worker be found to deviate from some agreed strategic choice, the remaining workers would change their own strategy to penalize the deviator. Hence, being rational, workers do not deviate. We identify analytically the parameter conditions to induce a desired worker behavior, and we evaluate experi- mentally the mechanisms derived from such conditions. We also compare the performance of our mechanisms with a previously known multi-round mechanism based on reinforcement learning.
△ Less
Submitted 24 August, 2015;
originally announced August 2015.
-
Practically-Self-Stabilizing Virtual Synchrony
Authors:
Shlomi Dolev,
Chryssis Georgiou,
Ioannis Marcoullis,
Elad Michael Schiller
Abstract:
Virtual synchrony is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is a key criterion for the success of such implementations. This is because large distributed systems can be highly available as long as they do not depend on the full operational status of every system p…
▽ More
Virtual synchrony is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is a key criterion for the success of such implementations. This is because large distributed systems can be highly available as long as they do not depend on the full operational status of every system participant. Namely, they employ redundancy in numbers to overcome non-optimal behavior of participants and to gain global robustness and high availability.
Self-stabilizing systems can tolerate transient faults that drive the system to an arbitrary unpredicted configuration. Such systems automatically regain consistency from any such arbitrary configuration, and then produce the desired system behavior. Practically self-stabilizing systems ensure the desired system behavior for practically infinite number of successive steps e.g., $2^{64}$ steps.
We present the first practically self-stabilizing virtual synchrony algorithm. The algorithm is a combination of several new techniques that may be of independent interest. In particular, we present a new counter algorithm that establishes an efficient practically unbounded counter, that in turn can be directly used to implement a self-stabilizing Multiple-Writer Multiple-Reader (MWMR) register emulation. Other components include self-stabilizing group membership, self-stabilizing multicast, and self-stabilizing emulation of replicated state machine. As we base the replicated state machine implementation on virtual synchrony, rather than consensus, the system progresses in more extreme asynchronous executions in relation to consensus-based replicated state machine.
△ Less
Submitted 25 April, 2018; v1 submitted 18 February, 2015;
originally announced February 2015.
-
Coping with Unreliable Workers in Internet-based Computing: An Evaluation of Reputation Mechanisms
Authors:
Evgenia Christoforou,
Antonio Fernandez Anta,
Chryssis Georgiou,
Miguel A. Mosteiro,
Angel Sanchez
Abstract:
We present reputation-based mechanisms for building reliable task computing systems over the Internet. The most characteristic examples of such systems are the volunteer computing and the crowdsourcing platforms. In both examples end users are offering over the Internet their computing power or their human intelligence to solve tasks either voluntarily or under payment. While the main advantage of…
▽ More
We present reputation-based mechanisms for building reliable task computing systems over the Internet. The most characteristic examples of such systems are the volunteer computing and the crowdsourcing platforms. In both examples end users are offering over the Internet their computing power or their human intelligence to solve tasks either voluntarily or under payment. While the main advantage of these systems is the inexpensive computational power provided, the main drawback is the untrustworthy nature of the end users. Generally, this type of systems are modeled under the "master-worker" setting. A "master" has a set of tasks to compute and instead of computing them locally she sends these tasks to available "workers" that compute and report back the task results. We categorize these workers in three generic types: altruistic, malicious and rational. Altruistic workers that always return the correct result, malicious workers that always return an incorrect result, and rational workers that decide to reply or not truthfully depending on what increases their benefit. We design a reinforcement learning mechanism to induce a correct behavior to rational workers, while the mechanism is complemented by four reputation schemes that cope with malice. The goal of the mechanism is to reach a state of eventual correctness, that is, a stable state of the system in which the master always obtains the correct task results. Analysis of the system gives provable guarantees under which truthful behavior can be ensured. Finally, we observe the behavior of the mechanism through simulations that use realistic system parameters values. Simulations not only agree with the analysis but also reveal interesting trade-offs between various metrics and parameters. Finally, the four reputation schemes are assessed against the tolerance to cheaters.
△ Less
Submitted 19 March, 2018; v1 submitted 10 July, 2013;
originally announced July 2013.
-
Algorithmic Mechanisms for Reliable Internet-based Computing under Collusion
Authors:
Antonio Fernandez Anta,
Chryssis Georgiou,
Miguel A. Mosteiro
Abstract:
In this work, using a game-theoretic approach, cost-sensitive mechanisms that lead to reliable Internet-based computing are designed. In particular, we consider Internet-based master-worker computations, where a master processor assigns, across the Internet, a computational task to a set of potentially untrusted worker processors and collects their responses. Workers may collude in order to increa…
▽ More
In this work, using a game-theoretic approach, cost-sensitive mechanisms that lead to reliable Internet-based computing are designed. In particular, we consider Internet-based master-worker computations, where a master processor assigns, across the Internet, a computational task to a set of potentially untrusted worker processors and collects their responses. Workers may collude in order to increase their benefit. Several game-theoretic models that capture the nature of the problem are analyzed, and algorithmic mechanisms that, for each given set of cost and system parameters, achieve high reliability are designed. Additionally, two specific realistic system scenarios are studied. These scenarios are a system of volunteer computing like SETI, and a company that buys computing cycles from Internet computers and sells them to its customers in the form of a task- computation service. Notably, under certain conditions, non redundant allocation yields the best trade-off between cost and reliability.
△ Less
Submitted 5 July, 2013;
originally announced July 2013.
-
Online Parallel Scheduling of Non-uniform Tasks: Trading Failures for Energy
Authors:
Antonio Fernández Anta,
Chryssis Georgiou,
Dariusz R. Kowalski,
Elli Zavou
Abstract:
Consider a system in which tasks of different execution times arrive continuously and have to be executed by a set of processors that are prone to crashes and restarts. In this paper we model and study the impact of parallelism and failures on the competitiveness of such an online system. In a fault-free environment, a simple Longest-in-System scheduling policy, enhanced by a redundancy-avoidance…
▽ More
Consider a system in which tasks of different execution times arrive continuously and have to be executed by a set of processors that are prone to crashes and restarts. In this paper we model and study the impact of parallelism and failures on the competitiveness of such an online system. In a fault-free environment, a simple Longest-in-System scheduling policy, enhanced by a redundancy-avoidance mechanism, guarantees optimality in a long-term execution. In the presence of failures though, scheduling becomes a much more challenging task. In particular, no parallel deterministic algorithm can be competitive against an offline optimal solution, even with one single processor and tasks of only two different execution times. We find that when additional energy is provided to the system in the form of processor speedup, the situation changes. Specifically, we identify thresholds on the speedup under which such competitiveness cannot be achieved by any deterministic algorithm, and above which competitive algorithms exist. Finally, we propose algorithms that achieve small bounded competitive ratios when the speedup is over the threshold.
△ Less
Submitted 7 June, 2013;
originally announced June 2013.
-
Measuring the Impact of Adversarial Errors on Packet Scheduling Strategies
Authors:
Antonio Fernández Anta,
Chryssis Georgiou,
Dariusz R. Kowalski,
Joerg Widmer,
Elli Zavou
Abstract:
In this paper we explore the problem of achieving efficient packet transmission over unreliable links with worst case occurrence of errors. In such a setup, even an omniscient offline scheduling strategy cannot achieve stability of the packet queue, nor is it able to use up all the available bandwidth. Hence, an important first step is to identify an appropriate metric for measuring the efficiency…
▽ More
In this paper we explore the problem of achieving efficient packet transmission over unreliable links with worst case occurrence of errors. In such a setup, even an omniscient offline scheduling strategy cannot achieve stability of the packet queue, nor is it able to use up all the available bandwidth. Hence, an important first step is to identify an appropriate metric for measuring the efficiency of scheduling strategies in such a setting. To this end, we propose a relative throughput metric which corresponds to the long term competitive ratio of the algorithm with respect to the optimal. We then explore the impact of the error detection mechanism and feedback delay on our measure. We compare instantaneous error feedback with deferred error feedback, that requires a faulty packet to be fully received in order to detect the error. We propose algorithms for worst-case adversarial and stochastic packet arrival models, and formally analyze their performance. The relative throughput achieved by these algorithms is shown to be close to optimal by deriving lower bounds on the relative throughput of the algorithms and almost matching upper bounds for any algorithm in the considered settings. Our collection of results demonstrate the potential of using instantaneous feedback to improve the performance of communication systems in adverse environments.
△ Less
Submitted 7 June, 2013;
originally announced June 2013.
-
A Distributed Algorithm for Gathering Many Fat Mobile Robots in the Plane
Authors:
Chrysovalandis Agathangelou,
Chryssis Georgiou,
Marios Mavronicolas
Abstract:
In this work we consider the problem of gathering autonomous robots in the plane. In particular, we consider non-transparent unit-disc robots (i.e., fat) in an asynchronous setting. Vision is the only mean of coordination. Using a state-machine representation we formulate the gathering problem and develop a distributed algorithm that solves the problem for any number of robots.
The main idea beh…
▽ More
In this work we consider the problem of gathering autonomous robots in the plane. In particular, we consider non-transparent unit-disc robots (i.e., fat) in an asynchronous setting. Vision is the only mean of coordination. Using a state-machine representation we formulate the gathering problem and develop a distributed algorithm that solves the problem for any number of robots.
The main idea behind our algorithm is for the robots to reach a configuration in which all the following hold: (a) The robots' centers form a convex hull in which all robots are on the convex, (b) Each robot can see all other robots, and (c) The configuration is connected, that is, every robot touches another robot and all robots together form a connected formation. We show that starting from any initial configuration, the robots, making only local decisions and coordinate by vision, eventually reach such a configuration and terminate, yielding a solution to the gathering problem.
△ Less
Submitted 18 September, 2012;
originally announced September 2012.
-
On the Practicality of Atomic MWMR Register Implementations
Authors:
Chryssis Georgiou,
Nicolas C. Nicolaou
Abstract:
Multiple-writer/multiple-reader (MWMR) atomic register implementations provide precise consistency guarantees, in the asynchronous, crash-prone, message passing environment. Fast MWMR atomic register implementations were first introduced in Englert et al. 2009. Fastness is measured in terms of the number of single round read and write operations that does not sacrifice correctness. In Georgiou et…
▽ More
Multiple-writer/multiple-reader (MWMR) atomic register implementations provide precise consistency guarantees, in the asynchronous, crash-prone, message passing environment. Fast MWMR atomic register implementations were first introduced in Englert et al. 2009. Fastness is measured in terms of the number of single round read and write operations that does not sacrifice correctness. In Georgiou et al. 2011 was shown, however, that decreasing the communication cost is not enough in these implementations. In particular, considering that the performance is measured in terms of the latency of read and write operations due to both (a) communication delays and (b)local computation, they introduced two new algorithms that traded communication for reducing computation. As computation is still part of the algorithms, someone may wonder: What is the trade-off between communication and local computation in real-time systems?
In this work we conduct an experimental performance evaluation of four MWMR atomic register implementations: SFW from Englert et al. 2009, APRX-SFW and CWFR from Georgiou at al. 2011, and the generalization of the traditional algorithm of Attiya et al. 1996 in the MWMR environment, which we call SIMPLE. We implement and evaluate the algorithms on NS2, a single-processor simulator, and on PlanetLab, a planetary-scale real-time network platform. Our comparison provides an empirical answer to the above question and demonstrates the practicality of atomic MWMR register implementations.
△ Less
Submitted 9 April, 2012; v1 submitted 11 November, 2011;
originally announced November 2011.