Article in Proceedings INPROC-2014-49

BibliographyGröger, Christoph; Schwarz, Holger; Mitschang, Bernhard: The Deep Data Warehouse. Link-based Integration and Enrichment of Warehouse Data and Unstructured Content.
In: Proceedings of the 18th IEEE International Enterprise Distributed Object Computing Conference (EDOC), 01-05 September, 2014, Ulm, Germany.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology.
english.
IEEE, September 1, 2014.
Article in Proceedings (Conference Paper).
CR-SchemaH.2.7 (Database Administration)
Abstract

Data warehouses are at the core of enterprise IT and enable the efficient storage and analysis of structured data. Besides, unstructured content, e.g., emails and documents, constitutes more than half of the entire enterprise data and contains a lot of implicit knowledge about warehouse entities. Thus, holistic ana-lytics require the integration of structured warehouse data and unstructured content to generate novel insights. These insights can also be used to enrich the integrated data and to create a new basis for further analytics. Existing integration approaches only support a limited range of analytical applications and require the costly adaptation of the warehouse schema. In this paper, we present the Deep Data Warehouse (DeepDWH), a novel type of data warehouse based on the flexible integration and enrichment of warehouse data and unstructured content, addressing the variety challenge of Big Data. It relies on information-rich in-stance-level links between warehouse elements and content items, which are represented in a graph-oriented structure. Neither adaptations of the existing warehouse nor the design of an overall federated schema are required. We design a conceptual linking model and develop a logical schema for links based on a property graph. As a proof of concept, we present a prototypical imple-mentation of the DeepDWH including a link store based on a graph database.

Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Entry dateJuly 7, 2014
   Publ. Department   Publ. Institute   Publ. Computer Science