Bibliography | Schneider, Jan; Gröger, Christoph; Lutsch, Arnold; Schwarz, Holger; Mitschang, Bernhard: Assessing the Lakehouse: Analysis, Requirements and Definition. In: Filipe, Joaquim (ed.); Smialek, Michal (ed.); Brodsky, Alexander (ed.); Hammoudi, Slimane (ed.): Proceedings of the 25th International Conference on Enterprise Information Systems, ICEIS 2023, Volume 1, Prague, Czech Republic, April 24-26, 2023. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology. pp. 44-56, english. Prague: SciTePress, May 23, 2023. ISBN: 978-989-758-648-4; ISSN: 2184-4992; DOI: 10.5220/0011840500003467. Article in Proceedings (Conference Paper).
|
CR-Schema | H.2.4 (Database Management Systems) H.2.7 (Database Administration) H.2.8 (Database Applications)
|
Keywords | Lakehouse; Data Warehouse; Data Lake; Data Management; Data Analytics |
Abstract | The digital transformation opens new opportunities for enterprises to optimize their business processes by applying data-driven analysis techniques. For storing and organizing the required huge amounts of data, different types of data platforms have been employed in the past, with data warehouses and data lakes being the most prominent ones. Since they possess rather contrary characteristics and address different types of analytics, companies typically utilize both of them, leading to complex architectures with replicated data and slow analytical processes. To counter these issues, vendors have recently been making efforts to break the boundaries and to combine features of both worlds into integrated data platforms. Such systems are commonly called lakehouses and promise to simplify enterprise analytics architectures by serving all kinds of analytical workloads from a single platform. However, it remains unclear how lakehouses can be characterized, since existing definitions focus al most arbitrarily on individual architectural or functional aspects and are often driven by marketing. In this paper, we assess prevalent definitions for lakehouses and finally propose a new definition, from which several technical requirements for lakehouses are derived. We apply these requirements to several popular data management tools, such as Delta Lake, Snowflake and Dremio in order to evaluate whether they enable the construction of lakehouses.
|
Full text and other links | Publication DOI
|
Contact | jan.schneider@ipvs.uni-stuttgart.de |
Department(s) | University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
|
Project(s) | Data Platform Architectures & Technologies
|
Entry date | September 8, 2023 |
---|