Article in Proceedings INPROC-2013-26

BibliographyKoldehofe, Boris; Mayer, Ruben; Ramachandran, Umakishore; Rothermel, Kurt; Völz, Marco: Rollback-Recovery without Checkpoints in Distributed Event Processing Systems.
In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (DEBS).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 27-38, english.
ACM, June 27, 2013.
DOI: 10.1145/2488222.2488259.
Article in Proceedings (Conference Paper).
CR-SchemaC.2.4 (Distributed Systems)
C.4 (Performance of Systems)
KeywordsReliability; Recovery; Complex Event Processing
Abstract

Reliability is of critical importance to many applications involving distributed event processing systems. Especially the use of stateful operators makes it challenging to provide efficient recovery from failures and to ensure consistent event streams. Even during failure-free execution, state-of-the-art methods for achieving reliability incur significant overhead at run-time concerning computational resources, event traffic, and event detection time. This paper proposes a novel method for rollback-recovery that allows for recovery from multiple simultaneous operator failures, but eliminates the need for persistent checkpoints. Thereby, the operator state is preserved in savepoints at points in time when its execution solely depends on the state of incoming event streams which are reproducible by predecessor operators. We propose an expressive event processing model to determine savepoints and algorithms for their coordination in a distributed operator network. Evaluations show that very low overhead at failure-free execution in comparison to other approaches is achieved.

Full text and
other links
PDF (555407 Bytes)
The original publication is available at ACM Digital Library
Copyright© ACM, 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in proceedings of the 7th international conference on Distributed Event-Based Systems, pp. 27-38, Arlington, Texas, USA, June 29 - July 3, 2013.
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Project(s)Adaptive Kommunikationssysteme
CEPiL
Entry dateMay 27, 2013
   Publ. Computer Science