Article in Proceedings INPROC-2018-55

BibliographyF. Lima, Guilherme; Slo, Ahmad; Bhowmik, Sukanya; Endler, Markus; Rothermel, Kurt: Skipping Unused Events to Speed Up Rollback-Recovery in Distributed Data-Parallel CEP.
In: Proceedings of 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT).
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 1-10, english.
IEEE, December 17, 2018.
Article in Proceedings (Conference Paper).
CR-SchemaC.2.4 (Distributed Systems)
Abstract

We propose two extensions for a state-of-the-art method of rollback-recovery in distributed CEP (complex event processing). In CEP, an operator network is used to search for patterns in events streams. Sometimes these operators fail and lose their state. Rollback-recovery is a method for dealing with such state losses. The type of rollback-recovery we consider is upstream backup, where the state of a failed operator is recovered by replaying to it the input events that led it to that state. These events are kept in upstream operators’ memory buffers, which are trimmed continuously as the downstream operator progresses. The first extension we propose saves memory and speeds up recovery by avoiding to store and retransmit unnecessary events. The second extension makes the base method of upstream backup compatible with data-parallel CEP, allowing that the windows into which operators partition their input be processed in parallel. We evaluated the proposed extensions through experiments that showed a significant reduction in memory usage and recovery time at the expense of a negligible processing overhead during normal operation.

Full text and
other links
PDF (420710 Bytes)
Copyright© 2018 IEEE Personal use of this material is permitted. Permission from IEEEreprinting/republishing this material for advertising or promotional purposes,or reuse of any copyrighted component of this work in other works.
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Distributed Systems
Entry dateNovember 13, 2019
   Publ. Department   Publ. Institute   Publ. Computer Science