Masterarbeit MSTR-2019-82

Wunschik, Marc-Sven: The Influence of Load Shedding on the Stream Quality in CEP Applications.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 82 (2019).
87 Seiten, englisch.

Both the Digitalisation and Industry 4.0 result in continuous data streams in all areas of life. To get high-level information out of these streams in real-time, we can use Complex Event Processing (CEP) and Stream Processing (SP). This is often done on several operator nodes on different sites. Each of these operators works on a fraction of the whole information gathering process. There are different types of operators for different tasks. The most important for this work are the propositional logic operator, the pattern-finding operator, and the average-building operator. The operators are organized in a directed acyclic graph, called the operator graph. Often the operator graph has to deal with great amounts of continuously incoming data. The data is bundled in discrete events. We can use load shedding to reduce the workload and to ensure that the operators process incoming data quickly. Load shedding removes some of the events from the waiting queue at an operator. As such it leads to lower latency and workload. But it can also lower the accuracy of the found information. Most publications on load shedding in CEP or stream processing examine, how to make good load shedders. But in this work, we examine how inaccuracies caused by load shedding propagate through the operator graph. This means how load shedding at a preceding operator can influence the current operator. We examine which operator types can potentially repair the inaccuracies in the data stream caused by prior load shedding. That means that the data stream is more accurate after the operator than it was before. As a baseline for comparison, we take a run of the same scenario without load shedding. To do this, we created a Java program that can simulate CEP and stream processing for different scenarios, using different operator types. For the experiment, we apply load shedding to the operators in the operator graphs of the different scenarios. In the evaluation, we explain how and why load shedding affects the different operator types in the way it does. We give guidelines on how to optimally shed load for each operator type. If the propositional logic operator has an input event type that is the limiting factor it reacts well to load shedding of the non-limiting factor. If the limiting factor is greatly outnumbered, this operator type can even repair the accuracy of the data stream. The accuracy of the pattern-finding operator is very negatively affected by load shedding, if we shed whole input events. If at some point in the operator graph we can shed load in such a way that only the attributes of the input events of the pattern-finding operator are altered, the resulting accuracy is much better. This could happen if we e.g. aggregate events prior to the pattern-finding operator. There is no potential here to improve the accuracy of the data stream. The average-building operator has very high accuracy when we use load shedding. We need to do the load shedding in a way that we shed all input data for the average-building operator proportionally. If that is the case the accuracy remains on average at 100%. There is only a small variance for higher load shedding drop rates. This operator type can, therefore, repair a prior loss of accuracy of the data stream almost entirely.

Volltext und
andere Links
Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Verteilte Systeme
BetreuerRothermel, Prof. Kurt; Röger, Henriette
Eingabedatum2. März 2020
   Publ. Informatik