Skip to main content

Showing 1–4 of 4 results for author: Zaidouni, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:1310.8486  [pdf, other

    cs.DC

    On the Combination of Silent Error Detection and Checkpointing

    Authors: Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni

    Abstract: In this paper, we revisit traditional checkpointing and rollback recovery strategies, with a focus on silent data corruption errors. Contrarily to fail-stop failures, such latent errors cannot be detected immediately, and a mechanism to detect them must be provided. We consider two models: (i) errors are detected after some delays following a probability distribution (typically, an Exponential dis… ▽ More

    Submitted 31 October, 2013; originally announced October 2013.

    Comments: This work was accepted to be published in PRDC'13. Work supported by ANR Rescue

    Report number: INRIA RR-8319

  2. arXiv:1302.4558  [pdf, other

    cs.DC

    Checkpointing strategies with prediction windows

    Authors: Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni

    Abstract: This paper deals with the impact of fault prediction techniques on checkpointing strategies. We suppose that the fault-prediction system provides prediction windows instead of exact predictions, which dramatically complicates the analysis of the checkpointing strategies. We propose a new approach based upon two periodic modes, a regular mode outside prediction windows, and a proactive mode inside… ▽ More

    Submitted 19 February, 2013; originally announced February 2013.

    Comments: 35 pages, work supported by ANR Rescue. arXiv admin note: substantial text overlap with arXiv:1207.6936, arXiv:1302.3752

    Report number: INRIA RR-8239

  3. Checkpointing algorithms and fault prediction

    Authors: Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni

    Abstract: This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide an optimal algorithm to decide when to take predictions into account, and we derive the optimal value of the checkpointin… ▽ More

    Submitted 3 December, 2013; v1 submitted 15 February, 2013; originally announced February 2013.

    Comments: Supported in part by ANR Rescue. Published in Journal of Parallel and Distributed Computing. arXiv admin note: text overlap with arXiv:1207.6936

    Report number: INRIA RR-8237

    Journal ref: Journal of Parallel and Distributed Computing, Available online 7 November 2013, ISSN 0743-7315

  4. arXiv:1207.6936  [pdf, other

    cs.DC cs.DS

    Impact of fault prediction on checkpointing strategies

    Authors: Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni

    Abstract: This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical analysis of Young and Daly in the presence of a fault prediction system, which is characterized by its recall and its precision, and which provides either exact or window-based time predictions. We succeed in deriving the optimal value of the checkpointing period (thereby minimizing… ▽ More

    Submitted 9 October, 2012; v1 submitted 30 July, 2012; originally announced July 2012.

    Comments: 20 pages

    Report number: INRIA Report 8023