Masterarbeit MSTR-2019-44

Eichler, Rebecca Kay: Metadata management in the data lake architecture.
Universität Stuttgart, Fakultät Informatik, Masterarbeit Nr. 44 (2019).
71 Seiten, deutsch.

The big data era has introduced a set of new challenges, one of which is the efficient storage of data at scale. As a result, the data lake concept was developed. It is a highly scalable storage repository, explicitly designed to handle (raw) data at scale and to support the big data characteristics. In order to fully exploit the strengths of the data lake concept, pro-active data governance and metadata management are required. Without data governance or metadata management, a data lake can turn into a data swamp. A data swamp signifies that the data has become useless, or has lost in value for a variety of reasons, therefore it is important to avoid this condition. In the scope of this thesis a concept for metadata management in data lakes is developed. The concept is explicitly designed to support all aspects of a data lake architecture. Furthermore, it enables to fully exploit the strengths of the data lake concept and it supports both classic data lake use cases as well as organization specific use cases. The concept is tested by applying it to a data inventory, data lineage and data access use case. Furthermore, a prototype is implemented demonstrating the concept through exemplary metadata and use case specific functionality. Finally, the suitability and realization of the use cases, the concept and the prototype are discussed. The discussion yields that the concept meets the requirements and is therefore suitable for the initial motivation of metadata management and data governance.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Höchstleistungsrechner, Anwendersoftware
BetreuerMitschang, Prof. Bernhard; Giebler, Corinna
Eingabedatum23. Oktober 2019
