Article in Proceedings INPROC-2011-44

BibliographyVölz, Marco; Koldehofe, Boris; Rothermel, Kurt: Supporting Strong Reliability for Distributed Complex Event Processing Systems.
In: Proceedings of 13th IEEE International Conference on High Performance Computing and Communications (HPCC-2011).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 477-486, english.
IEEE Computer Society Press, September 2011.
DOI: 10.1109/HPCC.2011.69.
Article in Proceedings (Conference Paper).
CR-SchemaC.2.4 (Distributed Systems)

Many application classes such as monitoring applications, involve processing a massive amount of data from a possibly huge number of data sources. Complex Event Processing (CEP) has evolved as the paradigm of choice to determine meaningful situations (complex events) by performing stepwise correlation over event streams. To keep up with the high scalability demands of growing input streams, recent approaches distribute event correlation over several correlation nodes. However, the distribution of event correlation severely limits the reliability of a CEP system. Already a failure of a single correlation node impacts the correctness of the final correlation result. Increasing the availability by a naive application of established replication principles introduces new problems in the context of CEP. In particular, ensuring the lossless delivery of events and the detection of duplicate events before processing them is a challenging task. In this paper, we illustrate the importance of a strong reliability semantics for CEP in the context of a monitoring application in a distributed production environment. Strong reliability ensures each complex event is detected and delivered exactly once to each application entity. We present a replication scheme which ensures strong reliability in an asynchronous system model and can be applied to an arbitrary distributed CEP system. The algorithm tolerates f simultaneous failures by introducing f additional replicas for each correlation node. We prove correctness as well as evaluate the overhead introduced by the algorithm. Results show, that the overhead scales linearly with the number of deployed replicas and the node failure rate.

Full text and
other links
PDF (237218 Bytes)
CopyrightThis material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE (contact By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Entry dateJune 28, 2011
   Publ. Department   Publ. Institute   Publ. Computer Science