Article in Proceedings INPROC-2020-50

BibliographyEichler, Rebecca; Giebler, Corinna; Gröger, Christoph; Schwarz, Holger; Mitschang, Bernhard: HANDLE - A Generic Metadata Model for Data Lakes.
In: Big Data Analytics and Knowledge Discovery: 22nd International Conference, DaWaK 2020, Bratislava, Slovakia, September 14–17, 2020, Proceedings.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
pp. 73-88, english.
Springer, Cham, September 11, 2020.
Article in Proceedings (Conference Paper).
CR-SchemaH.2 (Database Management)

The substantial increase in generated data induced the development of new concepts such as the data lake. A data lake is a large storage repository designed to enable flexible extraction of the data’s value. A key aspect of exploiting data value in data lakes is the collection and management of metadata. To store and handle the metadata, a generic metadata model is required that can reflect metadata of any potential metadata management use case, e.g., data versioning or data lineage. However, an evaluation of existent metadata models yields that none so far are sufficiently generic. In this work, we present HANDLE, a generic metadata model for data lakes, which supports the flexible integration of metadata, data lake zones, metadata on various granular levels, and any metadata categorization. With these capabilities HANDLE enables comprehensive metadata management in data lakes. We show HANDLE’s feasibility through the application to an exemplary access-use-case and a prototypical implementation. A comparison with existent models yields that HANDLE can reflect the same information and provides additional capabilities needed for metadata management in data lakes.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateNovember 13, 2020
   Publ. Department   Publ. Institute   Publ. Computer Science