Search | arXiv e-print repository

arXiv:1908.11640 [pdf, other]

doi 10.1109/ISSRE.2019.00023

Enhancing Failure Propagation Analysis in Cloud Computing Systems

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, Nematollah Bidokhti

Abstract: In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection… ▽ More In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost. △ Less

Submitted 30 August, 2019; originally announced August 2019.

Comments: 12 pages, The 30th International Symposium on Software Reliability Engineering (ISSRE 2019)

arXiv:1908.11297 [pdf, other]

Analyzing the Context of Bug-Fixing Changes in the OpenStack Cloud Computing Platform

Authors: Domenico Cotroneo, Luigi De Simone, Antonio Ken Iannillo, Roberto Natella, Stefano Rosiello, Nematollah Bidokhti

Abstract: Many research areas in software engineering, such as mutation testing, automatic repair, fault localization, and fault injection, rely on empirical knowledge about recurring bug-fixing code changes. Previous studies in this field focus on what has been changed due to bug-fixes, such as in terms of code edit actions. However, such studies did not consider where the bug-fix change was made (i.e., th… ▽ More Many research areas in software engineering, such as mutation testing, automatic repair, fault localization, and fault injection, rely on empirical knowledge about recurring bug-fixing code changes. Previous studies in this field focus on what has been changed due to bug-fixes, such as in terms of code edit actions. However, such studies did not consider where the bug-fix change was made (i.e., the context of the change), but knowing about the context can potentially narrow the search space for many software engineering techniques (e.g., by focusing mutation only on specific parts of the software). Furthermore, most previous work on bug-fixing changes focused on C and Java projects, but there is little empirical evidence about Python software. Therefore, in this paper we perform a thorough empirical analysis of bug-fixing changes in three OpenStack projects, focusing on both the what and the where of the changes. We observed that all the recurring change patterns are not oblivious with respect to the surrounding code, but tend to occur in specific code contexts. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 14 pages, The 30th International Symposium on Software Reliability Engineering (ISSRE 2019)

arXiv:1907.04055 [pdf, other]

doi 10.1145/3338906.3338916

How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, Nematollah Bidokhti

Abstract: Cloud management systems provide abstractions and APIs for programmatically configuring cloud infrastructures. Unfortunately, residual software bugs in these systems can potentially lead to high-severity failures, such as prolonged outages and data losses. In this paper, we investigate the impact of failures in the context widespread OpenStack cloud management system, by performing fault injection… ▽ More Cloud management systems provide abstractions and APIs for programmatically configuring cloud infrastructures. Unfortunately, residual software bugs in these systems can potentially lead to high-severity failures, such as prolonged outages and data losses. In this paper, we investigate the impact of failures in the context widespread OpenStack cloud management system, by performing fault injection and by analyzing the impact of the resulting failures in terms of fail-stop behavior, failure detection through logging, and failure propagation across components. The analysis points out that most of the failures are not timely detected and notified; moreover, many of these failures can silently propagate over time and through components of the cloud management system, which call for more thorough run-time checks and fault containment. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 12 pages, ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '19)

Journal ref: ESEC/FSE 2019 Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering Pages 200-211

Showing 1–3 of 3 results for author: Bidokhti, N