-
Technical Report: Estimating Reliability of Workers for Cooperative Distributed Computing
Authors:
Seda Davtyan,
Kishori M. Konwar,
Alexander A. Shvartsman
Abstract:
Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. For the problem of using network supercomputing to perform a large collection of independent tasks, prior work introduced a decentralized approach and provided randomized synchronous algorithms that perform all tasks correctly with hi…
▽ More
Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. For the problem of using network supercomputing to perform a large collection of independent tasks, prior work introduced a decentralized approach and provided randomized synchronous algorithms that perform all tasks correctly with high probability, while dealing with misbehaving or crash-prone processors. The main weaknesses of existing algorithms is that they assume either that the \emph{average} probability of a non-crashed processor returning incorrect results is inferior to $\frac{1}{2}$, or that the probability of returning incorrect results is known to \emph{each} processor. Here we present a randomized synchronous distributed algorithm that tightly estimates the probability of each processor returning correct results. Starting with the set $P$ of $n$ processors, let $F$ be the set of processors that crash. Our algorithm estimates the probability $p_i$ of returning a correct result for each processor $i \in P-F$, making the estimates available to all these processors. The estimation is based on the $(ε, δ)$-approximation, where each estimated probability $\tilde{p_i}$ of $p_i$ obeys the bound ${\sf Pr}[p_i(1-ε) \leq \tilde{p_i} \leq p_i(1+ε)] > 1 - δ$, for any constants $δ>0$ and $ε>0$ chosen by the user. An important aspect of this algorithm is that each processor terminates without global coordination. We assess the efficiency of the algorithm in three adversarial models as follows. For the model where the number of non-crashed processors $|P-F|$ is linearly bounded the time complexity $T(n)$ of the algorithm is $Θ(\log{n})$, work complexity $W(n)$ is $Θ(n\log{n})$, and message complexity $M(n)$ is $Θ(n\log^2n)$.
△ Less
Submitted 1 July, 2014;
originally announced July 2014.
-
Technical Report: Dealing with Undependable Workers in Decentralized Network Supercomputing
Authors:
Seda Davtyan,
Kishori M. Konwar,
Alexander Russell,
Alexander A. Shvartsman
Abstract:
Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. This paper presents a new algorithm for the problem of using network supercomputing to perform a large collection of independent tasks, while dealing with undependable processors. The adversary may cause the processors to return bogus…
▽ More
Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. This paper presents a new algorithm for the problem of using network supercomputing to perform a large collection of independent tasks, while dealing with undependable processors. The adversary may cause the processors to return bogus results for tasks with certain probabilities, and may cause a subset $F$ of the initial set of processors $P$ to crash. The adversary is constrained in two ways. First, for the set of non-crashed processors $P-F$, the \emph{average} probability of a processor returning a bogus result is inferior to $\frac{1}{2}$. Second, the adversary may crash a subset of processors $F$, provided the size of $P-F$ is bounded from below. We consider two models: the first bounds the size of $P-F$ by a fractional polynomial, the second bounds this size by a poly-logarithm. Both models yield adversaries that are much stronger than previously studied. Our randomized synchronous algorithm is formulated for $n$ processors and $t$ tasks, with $n\le t$, where depending on the number of crashes each live processor is able to terminate dynamically with the knowledge that the problem is solved with high probability. For the adversary constrained by a fractional polynomial, the round complexity of the algorithm is $O(\frac{t}{n^\varepsilon}\log{n}\log{\log{n}})$, its work is $O(t\log{n} \log{\log{n}})$ and message complexity is $O(n\log{n}\log{\log{n}})$. For the poly-log constrained adversary, the round complexity is $O(t)$, work is $O(t n^{\varepsilon})$, %$O(t \, poly \log{n})$, and message complexity is $O(n^{1+\varepsilon})$ %$O(n \, poly \log{n})$. All bounds are shown to hold with high probability.
△ Less
Submitted 1 July, 2014;
originally announced July 2014.
-
Highly Scalable Algorithms for Robust String Barcoding
Authors:
Bhaskar DasGupta,
Kishori M. Konwar,
Ion I. Mandoiu,
Alex A. Shvartsman
Abstract:
String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized…
▽ More
String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.
△ Less
Submitted 14 February, 2005;
originally announced February 2005.