Master Thesis MSTR-2017-13

BibliographyBettadapura Raghavendra, Shreyas: Relevance of the two adjusting screws in data analytics: data quality and optimization of algorithms.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 13 (2017).
97 pages, english.
Abstract

In the context of learning from data, the impact on the performance of a learning algorithm has traditionally been studied through the perspective of data preprocessing and through that of empirical works. We attempt to provide a middle ground by employing an approach which enables a systematic analysis considering the interaction between the quality of the data provided for training, and the configurations applied to the learning algorithm. This is achieved through the concepts of a Data Quality Profile, which depicts quality indicators for the dataset and a Classification Configuration Profile, which depicts the configuration parameters applied to the learning algorithm. Both the profiles have the common characteristic of being able to distinctly view, and equally represent the variations in their properties, allowing for a systematic study. We demonstrate this through a prototypical implementation, considering the data quality indicators of missing values, label imbalance, and high cardinality, and evaluating it against the CART Decision Tree algorithm, configurable by its splitting criteria, early stopping criteria, and training data preprocessing operations. We were able to successfully observe a relationship between decreasing quality of the training data, and deterioration in the performance of the algorithm. The flexibility of the approach allows for easy progression to other algorithms, and implementations of more quality indicators.

Full text and
other links
Volltext
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Mitschang, Prof. Bernhard; Villanueva Zacarias, Alejandro Gabriel; Kiefer, Cornelia
Entry dateMay 28, 2019
   Publ. Department   Publ. Institute   Publ. Computer Science