Master Thesis MSTR-2017-19

BibliographyChellathurai Saroja, Shalini: Measurement of the quality of structured and unstructured data accumulating in the product life cycle in a data quality dashboard.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 19 (2017).
79 pages, english.
Abstract

This thesis provides an overview on existing data quality metrics for structured and unstructured data as well as on the existing data quality dashboards for measuring the quality of structured and unstructured data. Open research questions for interpreting the data quality are discussed. The metrics percentage of null values, percentage of duplicate values and percentage of non-domain values were selected and implemented as REST based web services. Furthermore, a web application was developed to enable (1) upload of the data file for which data quality shall be assessed from two standard formats JSON and CSV and (2) flexible integration of various data quality metrics. The latter is enabled by using an interface. To illustrate the functionality of this interface, the metric percentage of spelling mistakes provided by the supervisor of the thesis is integrated with the web application. The data quality is indicated as percentage in the range from to 100 as well as encoded with colors for the whole dataset and for each column. Donut chart or pie chart visualizations are implemented for the chosen data quality metrics. The implemented web application and metrics were evaluated with the example datasets for data accumulating in the product life cycle as provided by the supervisor. Finally, the dashboard is compared with existing data quality dashboards and the results are tabulated.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Mitschang, Prof. Bernhard; Kiefer, Cornelia
Entry dateMay 28, 2019
   Publ. Computer Science