Article in Proceedings INPROC-2011-44

Bibliography	Völz, Marco; Koldehofe, Boris; Rothermel, Kurt: Supporting Strong Reliability for Distributed Complex Event Processing Systems. In: Proceedings of 13th IEEE International Conference on High Performance Computing and Communications (HPCC-2011). University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology. pp. 477-486, english. IEEE Computer Society Press, September 2011. DOI: 10.1109/HPCC.2011.69. Article in Proceedings (Conference Paper).
CR-Schema	C.2.4 (Distributed Systems)
Abstract	Many application classes such as monitoring applications, involve processing a massive amount of data from a possibly huge number of data sources. Complex Event Processing (CEP) has evolved as the paradigm of choice to determine meaningful situations (complex events) by performing stepwise correlation over event streams. To keep up with the high scalability demands of growing input streams, recent approaches distribute event correlation over several correlation nodes. However, the distribution of event correlation severely limits the reliability of a CEP system. Already a failure of a single correlation node impacts the correctness of the final correlation result. Increasing the availability by a naive application of established replication principles introduces new problems in the context of CEP. In particular, ensuring the lossless delivery of events and the detection of duplicate events before processing them is a challenging task. In this paper, we illustrate the importance of a strong reliability semantics for CEP in the context of a monitoring application in a distributed production environment. Strong reliability ensures each complex event is detected and delivered exactly once to each application entity. We present a replication scheme which ensures strong reliability in an asynchronous system model and can be applied to an arbitrary distributed CEP system. The algorithm tolerates f simultaneous failures by introducing f additional replicas for each correlation node. We prove correctness as well as evaluate the overhead introduced by the algorithm. Results show, that the overhead scales linearly with the number of deployed replicas and the node failure rate.
Full text and other links	PDF (237218 Bytes)
Copyright	This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE (contact pubs-permissions@ieee.org). By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Department(s)	University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Project(s)	CEPiL AKS
Entry date	June 28, 2011

Publ. Department Publ. Institute Publ. Computer Science