Master Thesis MSTR-2023-85

BibliographyRepanovici, Roland-Cristian: Concepts for Virtual Knowledge Graphs on Lakehouses.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 85 (2023).
89 pages, english.


This thesis explores the innovative integration of Virtual Knowledge Graphs (VKG) on a Data Lakehouse, navigating through the complexities and potentials that this synthesis holds for data-driven analytical processes in the digital transformation era. Focused on mitigating the intricacies of data architectures, lakehouses emerge as a convergence of data warehouses and data lakes, aiming to unify and optimize data storage and analytical processes. Virtual knowledge graphs, on the other hand, present a dynamic approach to data representation, allowing for a nuanced exploration of intricate, transitive relationships between data entities.

An intriguing aspect of this research pivots around the infusion of time-travel concepts within VKGs in a lakehouse environment. This concept heralds a transformative perspective, offering enhanced data versioning, historical data analysis, and improved traceability and auditing processes. Through the strategic integration of Delta Lake, a paradigm shift in handling and querying historical data is envisioned, bolstering the VKGs capabilities to facilitate complex analyses nd ata nsight xtraction.

An insightful exploration is conducted to discern the technological advancements and strategies that could seamlessly accommodate the time-travel feature, ensuring its effective utilization in enriching the VKG within the lakehouse. The study delves deeply into the understanding and application of the underlying metadata, query translation, and the optimization of temporal queries to ensure robust, efficient, and insightful data analytical processes. This research manifests a pioneering exploration, fostering a sophisticated understanding and development of concepts, tools, and technologies pivotal for the implementation and enhancement of VKGs in Lakehouses. It aims to unravel new avenues in efficient data processing, adaptable schema evolution, and enhanced user engagement and understanding, ultimately contributing with rich insights and novel perspectives to the evolving landscape of data analytics and knowledge representation.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Schwarz, Prof. Holger; Schneider, Jan
Entry dateFebruary 20, 2024
   Publ. Computer Science