Doctoral Thesis DIS-2014-04

BibliographyCipriani, Nazario: Flexible processing of streamed context data in a distributed environment.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Doctoral Thesis (2014).
249 pages, german.
CR-SchemaH.2.4 (Database Management Systems)
H.2.8 (Database Applications)
H.3.4 (Information Storage and Retrieval Systems and Software)
Abstract

Nowadays, stream-based data processing occurs in many context-aware applicatio scenarios, such as in context-aware facility management applications or in location-aware visualization applications. In order to process stream-based data in an application-independent manner, Data Stream Processing Systems (DSPSs) emerged. They typically translate a declarative query to an operator graph, place the operators on stream processing nodes and execute the operators to process the streamed data.

Context-aware stream processing applications often have different requirements although relying on the same processing principle, i.e. data stream processing. These requirements exist because context-aware stream processing applications differ in functional and operational behavior as well as their processing requirements. These facts are challenging on their own. As a key enabler for the effcient processing of streamed data the DSPS must be able to integrate this speciVc functionality seamlessly. Since processing of data streams usually is subject to temporal aspects, i.e. they are time critical, custom functionality should be integrated seamlessly in the processing task of a DSPS to prevent the formation of isolated solutions and to support exploitation of synergies.

Depending on the domain of interest, data processing often depends on highly domain-specific functionalities, e.g. for the application of a location-aware visualization pipeline displaying a three-dimensional map of its surroundings. The application runs on a mobile device and consists of many interconnected operations that form a network of operators called stream processing graph (SP graph). First, the friendsí locations must be collected and connected to their public profile. However, to enable the application to run smoothly for some parts of data processing the presence of a Graphics Processing Unit (GPU) is mandatory.

To solve that challenge, we have developed concepts for a flexible DSPS that allows the integration of specific functionality to enable a seamless integration of applications into the DSPS. Therefore, an architecture is proposed. A DSPS based on this architecture can be extended by integrating additional operators responsible for data processing and services realizing additional interaction patterns with context-aware applications. However, this specific functionality is often subject to deployment and run time constraints. Therefore, an SP graph model has been developed which reWects these constraints by allowing to annotate the graph by constraints, e.g. to constrain the execution of operators to only certain processing nodes or specify that the operator necessitates a GPU.

The data involved in the processing steps is often subject to restrictions w.r.t the way it is accessed and processed. Users participating in the process might not want to expose their current location to potentially unknown parties, restricting e.g. data access to known ones only. Therefore, in addition to the Wexible integration of specialized operators security aspects must also be considered, limiting the access of data as well as the granularity of which data is made available. We have developed a security framework that defines three different types of security policies: Access Control (AC) policies controlling data access, Process Control (PC) policies influencing how data is processed, and Granularity Control (GC) policies defining the Level of Detail (LOD) at which the data is made available. The security policies are interpreted as constraints which are supported by augmenting the SP graph by the relevant security policies.

The operator placement in a DSPS is very important, as it deeply influences SP graph execution. Every stream-based application requires a different placement of SP graphs according to its specific objectives, e.g. bandwidth should not fall below 500 MBit/s and is more important than latency. This fact constrains operator placement. As objectives might conflict among each other, operator placement is subject to trade-offs. Knowing the bandwidth requirements of a certain application, an application developer can clearly identify the specific Quality of Service (QoS) requirements for the correct distribution of the SP graph. These requirements are a good indicator for the DSPS to decide how to distribute the SP graph to meet the application requirements. Two applications within the same DSPS might have different requirements. E.g. if interactivity is an issue, a stream-based game application might in a first place need a minimization of latency to get a fast and reactive application. We have developed a multi-target operator placement (M-TOP) algorithm which allows the DSPS to find a suitable deployment, i.e. a distribution of the operators in an SP graph which satisfies a set of predefined QoS requirements. Thereby, the M-TOP approach considers operator-specific deployment constraints as well as QoS targets.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
Entry dateFebruary 6, 2015
   Publ. Department   Publ. Institute   Publ. Computer Science