Diploma Thesis DIP-3636

BibliographyMayer, Christian: Scalable data retrieval in a mobile environment.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Diploma Thesis No. 3636 (2014).
78 pages, english.
CR-SchemaC.2.4 (Distributed Systems)
H.3.3 (Information Search and Retrieval)
H.3.4 (Information Storage and Retrieval Systems and Software)
Abstract

Retrieving multidimensional data out of distributed systems becomes increasingly important. But applications of these systems are often not only interested in data vectors that match certain queries. Instead, many applications demand for retrieval of data with high quality. In this thesis, we design a distributed system that can be used by applications to retrieve data of high quality for arbitrary multidimensional queries. Major challenges for the quality-based data retrieval are to 1.) find an appropriate formalization of data quality, 2.) design routing algorithms for queries, that are robust in the presence of high dynamics with respect to the participants of the system and the data on the participants and 3.) handle heterogeneous and high-dimensional data in the system. In order to retrieve data quality, we propose 1.) the measure of confidence for a query that is based on clusters of data. When a participant of the system finds, that its confidence for a query is high, it will assume to possess data of high quality for that query. 2.) Further, we design and implement routing strategies in order to route queries to nodes that can answer them with high confidence. Maintaining exact routing tables for each possible query would be infeasible, so nodes have to model the data that can be reached via neighbours in routing models. Such modelling of data is based on structural properties of the data such as how good the data can be clustered. 3.) In the high-dimensional space, we have to overcome the curse of dimensionality: the structure of data can become invisible in higher dimensions. We address this problem with a method for dimensionality reduction that reduces the dimensions with highest data variance. The evaluation of our approaches shows a high accuracy of query routing, even if our approaches do not make use of scalability bottlenecks like flooding of the query or flooding of routing information. Further, we show that the use of dimensionality reduction in routing has positive influence on the routing accuracy. We think that the methods in our approach can be useful instruments, whenever the task of retrieving data of high quality has to be outsourced to a distributed system.

Full text and
other links
PDF (1750555 Bytes)
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Parallel Systems
Superviser(s)Tariq, Adnan, Muhammad
Entry dateAugust 11, 2014
   Publ. Department   Publ. Institute   Publ. Computer Science