Artikel in Tagungsband INPROC-2011-44

Bibliograph. Daten	Völz, Marco; Koldehofe, Boris; Rothermel, Kurt: Supporting Strong Reliability for Distributed Complex Event Processing Systems. In: Proceedings of 13th IEEE International Conference on High Performance Computing and Communications (HPCC-2011). Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik. S. 477-486, englisch. IEEE Computer Society Press, September 2011. DOI: 10.1109/HPCC.2011.69. Artikel in Tagungsband (Konferenz-Beitrag).
CR-Klassif.	C.2.4 (Distributed Systems)
Kurzfassung	Many application classes such as monitoring applications, involve processing a massive amount of data from a possibly huge number of data sources. Complex Event Processing (CEP) has evolved as the paradigm of choice to determine meaningful situations (complex events) by performing stepwise correlation over event streams. To keep up with the high scalability demands of growing input streams, recent approaches distribute event correlation over several correlation nodes. However, the distribution of event correlation severely limits the reliability of a CEP system. Already a failure of a single correlation node impacts the correctness of the final correlation result. Increasing the availability by a naive application of established replication principles introduces new problems in the context of CEP. In particular, ensuring the lossless delivery of events and the detection of duplicate events before processing them is a challenging task. In this paper, we illustrate the importance of a strong reliability semantics for CEP in the context of a monitoring application in a distributed production environment. Strong reliability ensures each complex event is detected and delivered exactly once to each application entity. We present a replication scheme which ensures strong reliability in an asynchronous system model and can be applied to an arbitrary distributed CEP system. The algorithm tolerates f simultaneous failures by introducing f additional replicas for each correlation node. We prove correctness as well as evaluate the overhead introduced by the algorithm. Results show, that the overhead scales linearly with the number of deployed replicas and the node failure rate.
Volltext und andere Links	PDF (237218 Bytes)
Copyright	This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE (contact pubs-permissions@ieee.org). By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Abteilung(en)	Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Verteilte Systeme
Projekt(e)	CEPiL AKS
Eingabedatum	28. Juni 2011

Publ. Abteilung Publ. Institut Publ. Informatik