-
Server Placement with Shared Backups for Disaster-Resilient Clouds
Authors:
Rodrigo de Souza Couto,
Stefano Secci,
Miguel Elias Mitre Campista,
Luís Henrique Maciel Kosmalski Costa
Abstract:
A key strategy to build disaster-resilient clouds is to employ backups of virtual machines in a geo-distributed infrastructure. Today, the continuous and acknowledged replication of virtual machines in different servers is a service provided by different hypervisors. This strategy guarantees that the virtual machines will have no loss of disk and memory content if a disaster occurs, at a cost of s…
▽ More
A key strategy to build disaster-resilient clouds is to employ backups of virtual machines in a geo-distributed infrastructure. Today, the continuous and acknowledged replication of virtual machines in different servers is a service provided by different hypervisors. This strategy guarantees that the virtual machines will have no loss of disk and memory content if a disaster occurs, at a cost of strict bandwidth and latency requirements. Considering this kind of service, in this work, we propose an optimization problem to place servers in a wide area network. The goal is to guarantee that backup machines do not fail at the same time as their primary counterparts. In addition, by using virtualization, we also aim to reduce the amount of backup servers required. The optimal results, achieved in real topologies, reduce the number of backup servers by at least 40%. Moreover, this work highlights several characteristics of the backup service according to the employed network, such as the fulfillment of latency requirements.
△ Less
Submitted 19 October, 2015;
originally announced October 2015.
-
Latency Versus Survivability in Geo-Distributed Data Center Design
Authors:
Rodrigo de Souza Couto,
Stefano Secci,
Miguel Elias Mitre Campista,
Luís Henrique Maciel Kosmalski Costa
Abstract:
A hot topic in data center design is to envision geo-distributed architectures spanning a few sites across wide area networks, allowing more proximity to the end users and higher survivability, defined as the capacity of a system to operate after failures. As a shortcoming, this approach is subject to an increase of latency between servers, caused by their geographic distances. In this paper, we a…
▽ More
A hot topic in data center design is to envision geo-distributed architectures spanning a few sites across wide area networks, allowing more proximity to the end users and higher survivability, defined as the capacity of a system to operate after failures. As a shortcoming, this approach is subject to an increase of latency between servers, caused by their geographic distances. In this paper, we address the trade-off between latency and survivability in geo-distributed data centers, through the formulation of an optimization problem. Simulations considering realistic scenarios show that the latency increase is significant only in the case of very strong survivability requirements, whereas it is negligible for moderate survivability requirements. For instance, the worst-case latency is less than 4~ms when guaranteeing that 80% of the servers are available after a failure, in a network where the latency could be up to 33 ms.
△ Less
Submitted 16 October, 2015;
originally announced October 2015.
-
Reliability and Survivability Analysis of Data Center Network Topologies
Authors:
Rodrigo de Souza Couto,
Stefano Secci,
Miguel Elias Mitre Campista,
Luís Henrique Maciel Kosmalski Costa
Abstract:
The architecture of several data centers have been proposed as alternatives to the conventional three-layer one.Most of them employ commodity equipment for cost reduction. Thus, robustness to failures becomes even more important, because commodity equipment is more failure-prone. Each architecture has a different network topology design with a specific level of redundancy. In this work, we aim at…
▽ More
The architecture of several data centers have been proposed as alternatives to the conventional three-layer one.Most of them employ commodity equipment for cost reduction. Thus, robustness to failures becomes even more important, because commodity equipment is more failure-prone. Each architecture has a different network topology design with a specific level of redundancy. In this work, we aim at analyzing the benefits of different data center topologies taking the reliability and survivability requirements into account. We consider the topologies of three alternative data center architecture: Fat-tree, BCube, and DCell. Also, we compare these topologies with a conventional three-layer data center topology. Our analysis is independent of specific equipment, traffic patterns, or network protocols, for the sake of generality. We derive closed-form formulas for the Mean Time To Failure of each topology. The results allow us to indicate the best topology for each failure scenario. In particular, we conclude that BCube is more robust to link failures than the other topologies, whereas DCell has the most robust topology when considering switch failures. Additionally, we show that all considered alternative topologies outperform a three-layer topology for both types of failures. We also determine to which extent the robustness of BCube and DCell is influenced by the number of network interfaces per server.
△ Less
Submitted 14 October, 2015; v1 submitted 9 October, 2015;
originally announced October 2015.