Artikel in Zeitschrift ART-2021-03

Bibliograph.
Daten
Eichler, Rebecca; Giebler, Corinna; Gröger, Christoph; Schwarz, Holger; Mitschang, Bernhard: Modeling metadata in data lakes—A generic model.
In: Data & Knowledge Engineering. Vol. 136.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik.
S. 1-17, englisch.
Elsevier, November 2021.
ISSN: 0169-023X; DOI: 10.1016/j.datak.2021.101931.
Artikel in Zeitschrift.
CR-Klassif.H.2 (Database Management)
KeywordsMetadata management; Metadata model; Data lake; Data management; Data lake zones; Metadata classification
Kurzfassung

Data contains important knowledge and has the potential to provide new insights. Due to new technological developments such as the Internet of Things, data is generated in increasing volumes. In order to deal with these data volumes and extract the data’s value new concepts such as the data lake were created. The data lake is a data management platform designed to handle data at scale for analytical purposes. To prevent a data lake from becoming inoperable and turning into a data swamp, metadata management is needed. To store and handle metadata, a generic metadata model is required that can reflect metadata of any potential metadata management use case, e.g., data versioning or data lineage. However, an evaluation of existent metadata models yields that none so far are sufficiently generic as their design basis is not suited. In this work, we use a different design approach to build HANDLE, a generic metadata model for data lakes. The new metadata model supports the acquisition of metadata on varying granular levels, any metadata categorization, including the acquisition of both metadata that belongs to a specific data element as well as metadata that applies to a broader range of data. HANDLE supports the flexible integration of metadata and can reflect the same metadata in various ways according to the intended utilization. Furthermore, it is created for data lakes and therefore also supports data lake characteristics like data lake zones. With these capabilities HANDLE enables comprehensive metadata management in data lakes. HANDLE’s feasibility is shown through the application to an exemplary access-use-case and a prototypical implementation. By comparing HANDLE with existing models we demonstrate that it can provide the same information as the other models as well as adding further capabilities needed for metadata management in data lakes.

Abteilung(en)Universität Stuttgart, Institut für Parallele und Verteilte Systeme, Anwendersoftware
Projekt(e)MetaMan
Eingabedatum1. Dezember 2021
   Publ. Abteilung   Publ. Institut   Publ. Informatik