Master Thesis MSTR-2019-87

BibliographyPopp, Matthias: Comprehensive Support of the Lifecycle of Machine Learning Models in Model Management Systems.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 87 (2019).
69 pages, english.

Today, Machine Learning (ML) is entering many economic and scientific fields. The lifecycle of ML models includes data pre-processing to transform raw data into features, training a model with the features, and providing the model to answer predictive queries. The challenge is to ensure accurate predictions by continuously updating the model with automatic or manual retraining. To be aware of all changes, e.g. datasets and parameters, it is required to store metadata over the entire ML lifecycle. In this thesis we present a concept and system for comprehensive support of the ML lifecycle. The concept includes a metadata schema, as well as a solution to collect and enrich the metadata. The metadata schema contains information about the experiment, runs, executions, executables and common artifacts in ML such as datasets, models, and metrics. The stored information can be used for comparisons, re-iterations, and backtracking of ML experiments. We achieve this by tracking the lineage of ML pipeline steps and collecting metadata such as hyperparameters. Furthermore, a prototype is implemented to demonstrate and evaluate the concept. A case study, based on a selected scenario, serves as the basis for a qualitative assessment. The case study shows that the concept meets all the requirements and is therefore a suitable approach to comprehensively support ML model lifecycle.

Full text and
other links
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Applications of Parallel and Distributed Systems
Superviser(s)Schwarz, PD Dr. Holger; Weber, Christian
Entry dateMarch 24, 2020
   Publ. Computer Science