With the proliferation of streaming data from sources such as sensors in the Internet of Things (IoT), situational aware applications become possible. Such applications react to situations in the surrounding world that are signaled by complex event patterns that occur in the sensor streams. In order to detect the patterns that correspond to the situations of interest, Complex Event Processing (CEP) is the paradigm of choice. In CEP, a distributed operator graph is spanned between the event sources and the applications. Each operator step-wise detects event patterns on subsequences, called windows, of its input stream and forwards output events that signal the detection to its successors. To cope with the ever-increasing workload at the operators, operator parallelization becomes necessary. To this end, data parallelization is a powerful paradigm, building on an architecture that consists of a splitter, operator instances and a merger, to scale up and scale out CEP operators. In doing so, the operators need to provide consistent output streams, i.e., not produce false-negatives or false-positives, keep a latency bound on pattern detection, elastically adapt their resource reservations to the workload, and be fault-tolerant against node and network failures. Related work has proposed data parallelization techniques that build on splitting the input event streams of an operator either in a key-based, a batch-based or a pane-based way. These approaches, however, only support a limited range of CEP operators.
The goals of this thesis are (i) to support data parallelization for all window-based CEP operators, (ii) to develop adaptation methods such that CEP operators can keep a user-defined latency bound while minimizing costs for computing and networking resources, and (iii) to develop recovery methods that guarantee fault-tolerance at a low run-time overhead.
To this end, the following contributions are made. First, we propose a window-based data parallelization method that is based on the externalization of the operator's window policy to a data parallelization framework. Second, basing on Queuing Theory, we propose a method to adapt the operator parallelization degree at run-time to the workload such that probabilistic bounds on the event buffering in the operator can be met. Third, we propose a batch scheduling algorithm that is able to assign subsequent overlapping windows to the same operator instance, so that communication overhead is minimized, while a latency bound in the operator instances is still kept. Forth, we propose a framework for parallel processing of inter-dependent windows that is based on the speculative processing of multiple versions of multiple windows in parallel. Fifth, we propose a lightweight rollback recovery method for CEP operator networks that exploits the externalization of the operator window policy to allow for the recovery of an arbitrary number of operators.